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Abstract 

The contribution of this paper is to introduce change of measure based techniques 
for the rare-event analysis of heavy-tailed stochastic processes. Our changes-of- measure 
are parameterized by a family of distributions admitting a mixture form. We exploit 
our methodology to achieve two types of results. First, we construct Monte Carlo es- 
timators that are strongly efficient (i.e. have bounded relative mean squared error as 
the event of interest becomes rare). These estimators are used to estimate both rare- 
event probabilities of interest and associated conditional expectations. We emphasize 
that our techniques allow us to control the expected termination time of the Monte 
Carlo algorithm even if the conditional expected stopping time (under the original dis- 
tribution) given the event of interest is infinity - a situation that sometimes occurs 
in heavy-tailed settings. Second, the mixture family serves as a good approximation 
(in total variation) of the conditional distribution of the whole process given the rare 
event of interest. The convenient form of the mixture family allows us to obtain, as 
a corollary, functional conditional central limit theorems that extend classical results 
in the literature. We illustrate our methodology in the context of the ruin probabil- 
ity P(sup„ > b), where S„ is a random walk with heavy-tailed increments that 
have negative drift. Our techniques are based on the use of Lyapunov inequalities for 
variance control and termination time. The conditional limit theorems combine the 
application of Lyapunov bounds with coupling arguments. 



1 Introduction 

Change-of-measure techniques constitute a cornerstone in the large deviations analysis of 
stochastic processes (see for instance [Ej). In the light-tailed setting, it is well understood 
that a specific class of changes-of- measure, namely exponential tilting, provide just the right 
vehicle to perform not only large deviations analysis but also to design provably efficient 
importance sampling simulation estimators. There is a wealth of literature on structural 
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results, such as conditional limit theorems, that justify the use of exponential changes of 
measure in these settings (see for instance [H |S] in the setting of random walks and [2U] in 
the context of networks). 

Our contribution in this paper is the introduction of change-of-measure techniques for 
the rare-event analysis of heavy-tailed stochastic processes. Our general motivation is to put 
forward tools that allow to perform both, large deviations analysis for heavy-tailed systems 
and, at the same time, construction of efficient Monte Carlo algorithms for estimation of 
rare events, in the same spirit as in light-tailed settings. To this end, we introduce a family 
of changes of measures that are parameterized by a mixture of finitely many distributions 
and develop mathematical tools for their analyses. We concentrate on a class of problems of 
interest both in queueing theory and risk theory, namely first passage time probabilities for 
random walks, which serve as a good stylized model for testing and explaining techniques 
at the interface of large deviations and simulation. For instance, the first paper ([21]) that 
introduced the notations of efficiency together with the application of light-tailed large de- 
viations ideas and exponential changes-of-measure, focused on this class of model problems. 
Such notations are now standard in rare-event simulation. In the heavy-tailed setting, first 
passage time problems for random walks also serve as an environment for explaining the 
challenges that arise when trying to develop efficient importance sampling estimators (see 
[3]). We will provide additional discussion on those challenges and contrast our methods here 
with recent approaches that have been developed for first passage time problems for heavy- 
tailed random walks. We will illustrate the fiexibility of our method in terms of simulation 
estimators that have good variance performance and good control on the cost per replication 
of the simulation estimator. The proposed change of measure also satisfies structural results 
(in the form of conditional limit theorems) in the spirit of the theory that has been developed 
in light-tailed environments. Let us introduce the setup that will be the focus of our paper. 

Let S = {Sn : n > 0} be a random walk with independently and identically distributed 
(i.i.d.) increments, {Xn '■ n > 1}, that is, Sn+i = S'„ + Xn+i for all n > and Sq = 0. We 
assume that /i = EXn < and that the X^s are suitably heavy-tailed (see Section [2]). For 
each b G M"*", let r?, = inf{n > 1 : S'„ > 6}. Of interest in this paper is the first passage time 



This paper introduces a family of unbiased simulation estimators for u (b) that can be 
shown to have bounded coefficient of variation uniformly over 6 > 0. The associated sam- 
pling distribution approximates ([2]) in total variation as 6 — oo. Unbiased estimators with 
bounded coefficient of variation are called strongly efficient estimators in rare event simula- 
tion (Chapter 6 in [1]). 

The construction of provably efficient importance sampling estimators has been the focus 
of many papers in the applied probability literature. A natural idea behind the construction 
of efficient importance sampling estimators is that one should mimic the behavior of the 

^If 5*0 = we use P (•) and E (•) to denote the associated probability measure and expectation operators 
in path space, respectively. If 5*0 = s, then we write Ps (•) and Es (•). 






and the conditional distribution of the random walk given {ri, < oo}, namely 




(2) 
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zero variance change of measure, which coincides precisely with the conditional distribution 
([2]). As it is well known, heavy-tailed large deviations are often governed by the "principle 
of the big jump", which, qualitatively speaking, indicates that asymptotically as 6 — )■ oo 
the event of interest (in our case {u < oo}) occurs due to the contribution of a single large 
increment of size Consequently, the principle of the big jump naturally suggests to 

mimic the zero variance change of measure by a distribution which assigns zero probability to 
the event that ruin occurs due to the contribution of more than one large jump of order Q{b). 
However, such an importance sampling strategy is not feasible because it violates absolute 
continuity requirements to define a likelihood ratio. This is the most obvious problem that 
arises in the construction of efficient importance sampling schemes for heavy-tailed problems. 
A more subtle problem discussed in [3] is the fact that the second moment of an importance 
sampling estimator for heavy-tailed large deviations is often very sensitive to the behavior 
of the likelihood ratio precisely on paths that exhibit more than one large jump for the 
occurrence of the rare event in question. We shall refer to those paths that require more 
than one large jump for the occurrence of the event Tf, < oo rogue paths. 

In the last few years state-dependent importance sampling has been used as a viable 
way to construct estimators for heavy-tailed rare-event simulation. A natural idea is to 
exploit the Markovian representation of in terms of the so-called Doob's h-transform. In 
particular, it is well known that 

P(X„+i e dx\Sr.,n< n<oo) = ^^^t_A^F {dx) , (3) 

u[h- bn) 



where F is the distribution of Xn+i- In [ID], a state dependent importance sampling esti- 
mator based on an approximation to ([3]) is constructed and a technique based on Lyapunov 
inequalities was introduced for variance control. In particular, by constructing a suitable 
Lyapunov function, in [10], it is shown that if v{h — s) is a suitable approximation to u{b — s) 
as b — s oo and w{b — s) = Ev{b — s — X) then simulating the increment Xn+i given Sn 
and u > n via the distribution 

P (X.+i G dx\Sn) = ^^^^^r^^^^F{dx) (4) 

W{b - Sn) 

provides a strongly efficient estimator for u {b). This approach provided the first provably 
efficient estimator for u (b) in the context of a general class of heavy-tailed increment distribu- 
tions, the class 5**, which includes in particular WeibuU and regularly varying distributions. 
Despite the fact that the importance sampling strategy induced by @ has been proved to 
be efficient in substantial generality, it has a few inconvenient features. First, it typically 
requires to numerically evaluate w {b — Sn) for each S'„ during the course of the algorithm. 
Although this issue does not appear to be too critical in the one dimensional setting (see 
the analysis in [12]), for higher dimensional problems, the numerical evaluation of w {b — Sn) 
could easily require a significant computational overhead. For instance, see the first passage 
time computations for multiserver queues, which have been studied in the regularly varying 



^For /(•) and g{-) non-negative we use the notation f{b) = 0{g{b)) if f{b) < cg{b) for some c G (0,oo). 
Similarly, f{b) = n{g{b)) if f{b) > cg{b) and we also write f{b) ~ o{g{b)) as 6 oo if f{b)/g{b) ^ as 
6 — >■ oo. 
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case in [TT]. The second inconvenient feature is that if the increments have finite mean but 
infinite variance we obtain E {ti,\ti, < oo) = oo. The strategy of mimicking the conditional 
distribution without paying attention to the cost per rephcation of the estimator could yield 
a poor overall computational complexity. Our proposed approach does not suffer from this 
drawback because our parametric family of changes of measures allows to control both the 
variance and the termination time. 

We now proceed to explicitly summarize the contributions of this paper. Further dis- 
cussion will be given momentarily and precise mathematical statements are given in Section 

1. We provide a strongly efficient estimator (i.e. bounded relative mean squared error as 
b oo) to compute the rare event probabilities and the associated conditional 
expectations, based on a finite mixture family, for which both the simulation and 
density evaluation are straightforward to perform (see Theorem [T]). Several features of 
the algorithm include: 

(a) The results require the distribution to have an eventually concave cumulative 
hazard function, which includes a large class of distributions including regularly 
varying, Weibull distribution, log-normal distribution and so forth (see assump- 
tions in Section [2]). 

(b) One feature of the proposed algorithm relates to the termination time. When the 
increments are regularly varying with tail index i G (1,2), E{r\\ry) < oo) = oo. 
This implies that the zero-variance change of measure takes infinity expected 
time to generate one sample. In contrast, we show that the proposed importance 
sampling algorithm takes 0{b) expected time to generate one sample while still 
maintaining strong efficiency if t G (1.5,2) -Theorem [31 

(c) For the case that l G (1, 1.5], we show that the (1 +7)-th moment of the estimator 
is of order 0{u^^'^{b)) with 7 > depending on l. In addition, the expected 
termination time of the algorithm is 0{b) (Theorem Hj). Therefore, to compute 
u{b) with e relative error and at least 1 — S probability, the total computation 
complexity is 0{b). 

2. The mixture family approximates the conditional distribution of the random walk given 
ruin in total variation. Based on this strong approximation and on the simplicity of the 
mixture family's form we derive a conditional functional central limit theorem of the 
random walk given ruin, which further extends existing results reported in [9] (compare 
Theorems [21 [51 and M below) . 

As mentioned earlier, the simulation estimators proposed in this paper are based on 
importance sampling and they are designed to directly mimic the conditional distribution 
of S given rf, < 00 based on the principle of the big jump. This principle suggests that one 
should mimic the behavior of such a conditional distribution at each step by a mixture of 
two components: one involving an increment distribution that is conditioned to reach level 
b and a second one corresponding to a nominal (unconditional) increment distribution. This 
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two-mixture sampler, which was introduced by [18] in the context of tail estimation of a 
fixed sum of heavy-tailed random variables, has been shown to produce strongly efficient 
estimators for regularly varying distributions [TSl dU [12]. However, two-component 
mixtures are not suitable for the design of strongly efficient estimators in the context of 
other types of heavy-tailed distributions. In particular, two-component mixtures are not 
applicable to semiexponential distributions (see [16] for the definition) such as WeibuU. 

As indicated, one of our main contributions in this paper is to introduce a generalized 
finite-mixture sampler that can be shown to be suitable for constructing strongly efficient 
estimators in the context of a general class of heavy-tailed distributions, beyond regularly 
varying tails and including lognormals and WeibuUian-type tails. Our mixture family also 
mimics the qualitative behavior mentioned above; namely, there is the contribution of a 
large jump and the contribution of a regular jump. In addition, one needs to control the 
behavior of the likelihood ratio corresponding to rogue sample paths. Depending on the 
degree of concavity of the cumulative hazard function (which we assume to be eventually 
strictly concave) we must interpolate between the large jump component and the nominal 
component in a suitable way. At the end, the number of mixtures is larger for cumulative 
hazard functions that are less concave. 

Our mixture family and our Lyapunov based analysis allow to obtain an importance 
sampling scheme that achieves strong efficiency and controlled expected termination time even 
if the optimal (in terms of variance minimization) change of measure involves an infinite 
expected termination time. More precisely, if the increment distribution is regularly varying 
with tail index 6 G (1,2) it follows using the Pakes-Veraberbeke theorem (see Theorem [7l) 
that 



Nevertheless, as we will show, if i G (1.5, 2] we can choose the mixture parameters (which are 
state-dependent) in such a way that (using (■) to denote the probability measure induced 
by our importance sampling strategy assuming 5*0 = 0) 



while maintaining strong efficiency. We believe this feature is surprising! In particular, it 
implies that one can construct a family of estimators for expectations of the form E{H{Sk '■ 
k < Tb)\Tb < oo) that requires overall O (6) random numbers generated uniformly over a class 
of functions such that < Kq < H < Ki < oo, even if E{Tb\Tb < oo) = oo. We shall also 
informally explain why l > 1.5 appears to be a necessary condition in order to construct an 
unbiased estimator satisfying both strong efficiency and 

In addition, for the case that t G (1, 1.5], we are able to construct an estimator whose 
(1 + 7)-th moment (for < 7 < (i — l)/(2 — i)) is of order 0{u^~^'^{b)) while the expected 
termination time is 0{b). We will also argue that the bound on 7 is essentially optimal. 
Consequently, as it is shown in Theorem HI to compute u{b) with e relative error and at least 
1 — 5 probability, the total computational complexity is 0{b). 




n=l 



00 



E'^n = O (6) 



(5) 
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In addition to providing a family of strongly efficient estimators for u(b), our finite- 
mixture family can approximate the conditional measure ([2]) in total variation as 6 oo. 
This approximation step further strengthens our family of samplers as a natural rare-event 
simulation scheme for heavy-tailed systems. Moreover, given the strong mode of convergence 
and because the mixture family admits a friendly form, we are able to strengthen classical 
results in the literature on heavy tailed approximations, see [9]. For instance, if a given incre- 
ment has second moment, we will derive, as a corollary of our approximations, a conditional 
functional central limit theorem up to the first passage time r?,. Thereby, this improves the 
law of large numbers derived in [9] . Another related result in the setting of high dimension 
regularly varying random walk is given in [22]. We believe that the proof techniques behind 
our approximations, which are based on coupling arguments, are of independent interest and 
that they can be used in other heavy-tailed environments. 

A central technique in the analysis of both the computational complexity and our con- 
ditional limit theorems is the use of Lyapunov functions. The Lyapunov functions are used 
for three different purposes: First in showing the strong efficiency of the importance sam- 
pling estimator, second in providing a bound on the finite expected termination time of the 
algorithm, and finally in proving the approximation in total variation of the zero-variance 
change of measure. The construction of Lyapunov functions follows the so called fiuid heuris- 
tic, which is well known in the literature of heavy-tailed large deviations and has also been 
successfully applied in rare event simulation, see [I5l [HI [13] . 

This paper is organized as follows. In Section [2l we introduce our assumptions, our family 
of changes of measures and we provide precise mathematical statements of our main results. 
Section [3] discusses some background results on large deviations and Lyapunov inequalities 
for importance sampling and stability of Markov processes. The variance analysis of our 
estimators is given in Section [H The results corresponding to the termination time of our 
algorithm can be found in Section [51 Then we have our results on strong conditional limit 
theorems in Section [6] We provide numerical experiments in Section [3 Finally, we added 
an appendix which contains auxiliary lemmas and technical results. 



We shall use X to denote a generic random variable with the same distribution as any of 
the Xj's describing the random walk Sn = J^i^i -^iy = 1; 2, ... with 5*0 = 0. We write 

F{x) = P{X < x), F{x) = P{X > x) and EX = n e (-oo,0). Further, let A(-) be 
the cumulative hazard function and A(-) be the hazard function. Therefore, F has density 
function, for x G {—oo, oo) 



Of primary interest to us is the design of efficient importance sampling (change of measure 
based) estimators for 



as fe — oo when F is suitably heavy-tailed. In particular, throughout this paper we shall 
assume either of the following two sets of conditions: 



2 Main Results 



fix) = A(x)e-^("\ and F{x) = e'^^'^ 




n>l 



(6) 
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Assumption A: F has a regularly varying right tail with index l > 1. That is, 

F{x) = 1 -F{x) = L{x)x-\ 
where L(-) is a slowly varying function at infinity, that is, lim^_5.oo = 1 for all 

te (0,1]. 

Or 

Assumption B: There exists Bq > such that for all x > Bq the following conditions 
hold. 

Bl Suppose that lim2;_j.oo a;A(x) = oo. 

B2 There exists /3o G (0, 1) such that d\ogA{x) = A (x) /A (x) < (3oX~^ for x > Bq. 

B3 Assume that A (■) is concave for all x > Bq; equivalently, A (■) is assumed to be non 
increasing for x > Bq. 

B4 Assume that 

P{X > x + t/X (x) \X > x) = exp (-t) (1 + (1)) 

as X oo uniformly over compact sets in t > 0. In addition, for some a > 1, 
P{X > x + t/X{x) \X > x) < for all t,x> Bq. 

Remark 1 The analysis requires A(-) to be differentiahle only for x > Bq. The reason for 
introducing Assumptions A and B separately is that the analysis for regularly varying distri- 
butions is somewhat different from (easier than) the cases under Assumption B. Assumption 
Bl implies that the tail of X decays faster than any polynomial. Assumptions B2 and B3 
basically say that the cumulative hazard function of F is "more concave" than at least some 
Weibull distribution with shape parameter /3o < 1- Typically, the more concave the cumula- 
tive hazard function is, the heavier the tail is. Therefore, under Assumption B, F is basically 
assumed to have a heavier tail than at least some Weibull distribution with shape parameter 
/So < 1- Assumption B4 is required only in Theorem\^ which states the functional central 
limit theorem of the conditional random walk given ruin. Note that the Assumptions A and 
B cover a wide range of heavy-tailed distributions that are popular in practice, for instance, 
regularly varying, log-normal, Weibull with (3o G (0, 1) and so forth. 

In our random walk context, state-dependent importance sampling involves studying 
a family of densities (depending on "current" state s of the random walk) which governs 
subsequent increments of the random walk. More precisely, we write 

Qs (x) = Ts (x)"^ / (x) , 

where rs(-) is a non-negative function such that Er^ {X) = 1 for a generic family of state- 
dependent importance sampling increment distributions. If we let Q (■) represent the proba- 
bility measure in path-space induced by the subsequent generation of increments under (■), 
then it follows easily that 

u{B) = E'^[I (r, <oo)L,], 
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with 

n 

U = J2rs,.ASj-S,.,). (7) 
i=i 

We say that 

Zk = I in < oo) Lh (8) 
is an importance samphng estimator for u {b) and its second moment is simply 

S'^fj (r, < oo) Ll] = E[I in < oo) L,]. 

If we select Q {■) = P (-Irfe < oo), or equivalently we let (x) = u{b — s)/u (b — s — x), then 
the corresponding importance sampling estimator would yield zero variance. Hence, we call 
it zero- variance importance sampling estimator; and we call -P(-|Tb < oo) the zero-variance 
change of measure or zero-variance importance sampling distribution. 

One of our main goals in this paper is to show that we can approximate the zero- variance 
change of measure quite accurately using finitely many mixtures whose parameters can be 
easily computed in advance. As a consequence, we can use Monte Carlo simulation to not 
only accurately estimate u (b) but also associated conditional expectations of the random 
walk given < oo. In fact, we can improve upon the zero variance change of measure 
in terms of overall computational cost when it comes to estimating sample-path conditional 
expectations given r;, < oo in situations where E (rf,|rfe < oo) = oo. The precise mathematical 
statements are given later in this section. Future sections are dedicated to the development 
and the proofs of these statements. 

Before stating the main results, we would first introduce the family of change of measures 
which is based on a mixture of finitely many computable and simulatable distributions. 

2.1 The mixture family 

We start by describing the precise form of the mixtures that we will use to construct efficient 
importance sampling schemes. The family is constructed to consider the contribution of a 
"large jump" which makes the walk reach level b in the next step, a "regular jump" which 
allows the random walk to continue under (nearly) its original dynamics, and a number 
of "interpolating" contributions. This intuition is consistent with the way in which large 
deviations occur in heavy-tailed environments. 

If 6 — s > ?7* for ?7* > sufficiently large and to be specified in our analysis, we propose 
to use a finite mixture family of the form 

k 

qs{x) =pj^{x\s) +p„/„(x|s) + ^Pjfj{x\s), (9) 

i=i 

where p*, p^^, pj e [0, 1), + p** + Yl'j=iPj = I, k eN, and f^, and fj for j = 1, .., k 
are properly normalized density functions, whose supports are disjoint and depend on the 
"current" position of the walk, s. We will give specific forms momentarily. The choice of k 
depends on the concavity of the cumulative hazard function, but otherwise is independent 
of b and s. We will ultimately let p^,p^* and the p/s depend on s. In addition, we will also 
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choose not to apply importance sampling if we are suitably close to the boundary level b. In 
other words, overall we have that 



k 

qs{x)= J9*/*(a;|s) + I {b ~ s > r]^) + f (x) I {b - s < r]^) . (10) 

We next specify the functional forms of each mixture distribution. First, 

I{x<b-s- A-i(A(6 - s) - a*)) 



f*ix\s) = f{x) 



P{X <b-s- A-i(A(6 - s) - a,)) ' 



where > 0. So, /* represents the mixture component corresponding to a "regular" incre- 
ment. 

Further, for a** > 0, let 



P{X > A-i(A(6-s) -a„))' 



represents the mixture component corresponding to the situation in which the rare event 
occurs because this particular increment is large. Note that 

P{X >b- s\X > A~^(A(6 - s)- a„)) = exp (-a„) . 

Therefore, if the "next increment", X, given the current position, s, is drawn from /^,*, there 
is probability 1 — exp (—a**) > that the next position of the random walk, namely s + X, 
is below the threshold b. This particular feature is important in the variance control. It is 
necessary to introduce such a positive a** to achieve strong efficiency if we want to consider 
the possibility of rogue paths in our sampler. 

As we mentioned before, the choice of k depends on the "concavity" of the cumulative 
hazard function A(-). The more concave A(-) is, the smaller k one can usually choose. In 
the regularly varying case, for example, a two-mixture distribution is sufficient (i.e. k = 0). 
The analysis of importance sampling algorithms in this case has been substantially studied 
in the literature (see [HI [151 [HI [13]). We can see that this feature is captured in our current 
formulation because in the regularly varying case one can find a*, a** > such that 

b-s- A-\A{b - s) - a,) > A-\A{b - s) - a„), (11) 

for all 6 — s large enough so that one can choose k = 0. Indeed, to see how (fTTl) holds for 
the regularly varying case, just note that for any a G (0, 1), for each t, the inequality 

at > A"^(A (t) - a„) 

is equivalent to 

P(X > at) , , 

p\x>t) ^^^P(»-)- (12) 

Similarly, 

t-A^\A{t) - a*) > at 
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holds if and only if 

P(X>(l-a)t) 

Karamata's theorem for regularly varying distributions ensures that it is always possible to 
choose a*, a** > given any a G (0, 1) so that (fT2l) and (fT3l) hold for uniformly in t and 
therefore we have that ( iTTl) holds. If Assumption A holds, we choose a** and then select a* 
(possibly depending on 6 — s) such that 

b-s- A-\A{b -s)- a,) = A~\A{b - s) - a„). (14) 

This selection is slightly different from the two-mixture form that has been analyzed in the 
literature (see [l5l[TU[T3]) which involves a "regular" component with support on (— oo, a{b — 
s)] and a "large jump" component with support on (a(6 — s), oo), for a e (0, 1). Our analysis 
here also applies to this parameterization. Nevertheless, to have unified statements in our 
results, under both Assumptions A and B, we opted for using equation (IT^ . 

When (fTTj) does not hold (for instance in the case of Weibull tails with shape parameter 
(3 G (0,1)), we will need more mixtures. In particular, we consider a set of cut-off points 
Co < ... < Cfc depending on 6 — s. Ultimately, we will have 

Cj = aj {b — s) for j = 1, 2, /c — 1. 

where Oi < ... < ak-i- The precomputed depending on /3o (from Assumption 

B3) according to Lemma [9] (Section Hj). We let cq = 6 — s — A~^(A(6 — s) — a*)) and 
Cfc = A~^(A(6 — s) — a**)). Given these values we define for 1 < j < A; — 1, 



For j = k, 



fk{x) = f{b- s- x)- 



P(XG(c,_i,c,])- 

l{x e (cfc_i,Cfc]) 



P{X e{b-s-Ck,b-s- Cfc_i]) 
In our previous notation, we then can write 



k—1 

pj{x < Co) p*J{x > Ck) \- Pjljx E (cj-i,Cj]) f{b - s- x)pkl{x E {ck^i, c^]) 



P{X < Co) P{X > Ck) ^ P{X E (c,_i, Cj]) f{x)P{X e{b-s-Ck,b-s- Ck-i]) ^ 
X I {b - s > r]^) + I {b - s < T]^) . 

With this family of change of measures, we are ready to present our main results which 
are based on appropriate choices of the various tuning parameters. 



2.2 Summary of the results 

Our first result establishes that one can explicitly choose r^*, c/s, a*, a**, p^:, p** and the 
p/s in order to have a strongly efficient (in the terminology of rare-event simulation, see [5]) 
estimator. 
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Theorem 1 Under either Assumptions A or Bl-3, there exists an explicit selection of rj^,, 
the Cj 's, a*, a**, p^, p** and the pj 's so that the estimator Zf, (defined as in ^) is strongly 
efficient in the sense of being unbiased and having a bounded coefficient of variation. In 
particular, one can compute K G (0, oo) (uniform in b > 0) such that 

E^Zl EL,I in <oo) 

— < A 



{EQZ.f u {bf 



for b> 0. 



The proof of this resuh is given at the end of Section |H The exphcit parameter selection 
is discussed in items I) to IV) stated in Section HI A consequence of this result is that, 
by Chebyshev's inequality, at most n = O {e~'^S~^) i.i.d. replications of Zi, are enough in 
order to estimate u (b) with e-relative precision and with probability at least 1 — 6 uniformly 
in b. Because the estimator Z^ is based on importance sampling, one can estimate a large 
class of expectations of the form uh (b) = E{H {Sn '■ n < Tf,) \Tb < oo) with roughly the same 
number of replications in order to achieve e-relative precision with at least 1 — 6 probability 
(uniformly in b). Indeed, if Ki G (0, oo) is such that K^^ ^ H ^ Ki then we have that 
Uh ip) > Ki^. We also have that L^J (rb < oo) H (S'„, : n < Tb) is an unbiased estimator for 
E[H {Sn '■ n < Tf,) ;Tb < oo) and its second moment is bounded by Kfu (6)^. Therefore, we 
can estimate both the numerator and the denominator in the expression 

UH{b) = E [H {Sn : n < n) \ n < oo) = 



u{b) 

with good relative precision (uniformly in b). Naturally, the condition K^^ < H < Ki is just 
given to quickly explain the significance of the previous observation. More generally, one 
might expect strong efficiency for uh (b) using an importance sampling estimator designed 
to estimate u (b) if (b) G {K{^, Ki) uniformly in b. 

Given that nothing has been said about the cost of generating a single replication of Zh, 
strong efficiency is clearly not a concept that allows to accurately assess the total compu- 
tational cost of estimating u (b) or uh (b). For this reason, we will also provide results that 
estimate the expected cost required to generate a single replication of Zb. However, before 
we state our estimates for the cost per replication, it is worth discussing what is the perfor- 
mance of the zero-variance change of measure for the regularly varying case. The following 
classical result ([6]) provides a good description of {Sn : n > 0) given Tb < oo. 



Theorem 2 (Asmussen and Kluppelberg) Suppose that X is regularly varying with in- 
dex L > 1 and define a (6) = J^^ P {X > u) du/P {X > b). Then, conditional on Tb < oo we 
have that 

where the convergence occurs in the space R x D[0,1) x R, P [Yi > t) = {1 + t/{L — 1))"''"'"-'^ 
forty and i = 0,1 and P {Yq > yo, Yi > y^) = P {Yq > yo + yi) ■ 
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Remark 2 The previous result suggests that if Assumption A holds, the best possible perfor- 
mance that one might realistically expect is E'^Tb = O (b) as long as (very important!) l > 2. 
The full statement of Asmussen and Kluppelberg's result (Theorem 1.1 in JB^) also covers 
other subexponential distributions. For instance, in the case of Weibull-type tails with shape 
parameter f3o, their result suggests that E{Tb\Tb < oo) = 0{b^~^''). 

As the next theorem states, for the regularly varying case with t > 1.5, we can guarantee 
E^Tf, = O (b) while maintaining strong efficiency as stated in Theorem [H We will also 
indicate why we believe that this result is basically the best possible that can be obtained 
among a reasonable class of importance sampling distributions. 

Theorem 3 

• If Assumption A holds and l > 1.5, then there exists an explicit selection of rj^, the 
Cj's, a^, a**, p*, p*,, such that strong efficiency (as indicated in Theorem\^ holds and 

E^n <po + pib 

for some po,Pi > independent ofb. 

• If Assumptions Bl-3 hold, we assume there exists 6 > and /3 G [0, /3o] such that 
X{x) > 6x^~^ for X sufficiently large. Then, with the parameters selected in Theorem 
Ui there exists po and pi independent of b, such that, 

E'^Tb<po + Pib'~^ 

Remark 3 The results in this theorem follow directly as a consequence of Propositions 
and\2\ Section \^ For the regularly varying case (Assumption A), in addition to the 
explicit parameter selection indicated in items I) to IV) in Section^ which guarantee strong 
efficiency, we also add item V) in Section which explicitly indicates how to select the 
parameters to obtain O [b) expected stopping time while maintaining strong efficiency. We 
assume that it takes at most a fixed cost c of computer time units to generate a variable 
from Qs (■) (uniformly in s). The previous result implies that if X is regularly varying with 
index l > 1.5 , then our importance sampling family estimates u (b) and associated conditional 
expectations such as uh (b) in O {e~^6~^b) units of computer time. This is in some sense 
(given that we have linear complexity in b even if i & (1.5, 2) j better than what one might 
expect in view of Theorem\^ We will further provide an argument, see Remark\^in Section 
for why in the presence of regular variation i > 3/2 appears to be basically a necessary 
condition to obtain strongly efficient unbiased estimators with 0{b) expected termination 
time. 

Remark 4 For the second case in Theorem O note that when Assumption Bl holds, one 
can always choose /3 = and 5 arbitrarily large. This implies that the expected termination 
time is at the most 0{b) under Assumption B. It is desirable to choose /3 as large as possible 
because this yields a (asymptotically) smaller termination time. However, there is an upper 
bound, namely Pq, which can be derived from Assumption B2 (LemmaU^). 
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For the regularly varying case, we provide further results for all l > 1. If i > 1, we are 
able to construct an importance sampling estimator Zi, such that for some 7 > we can 
guarantee E'^{Zl^'^) < Ku{b)^^^ and at the same time E'^Tf, = 0{b). The next result, 
whose proof is given at the end of Section |5l allows us to conclude that this can be achieved 
with our method as well. 

Theorem 4 Suppose that Assumption A is in force and t G (1, 1.5]. Then, for each 7 G 
(0, (i — l)/(2 — l)) we can select K > 0, and a member of our family of importance sampling 
distributions such that 

E^iZ]-^^) <Ku {bf^^ 

for all b > and E'^ {jb) < po + Pib for pq, pi G (0, 00). Consequently, assuming that each 
increment under (■) takes at most constant units of computer time, then O {e~'^/'~'5~^/'~'b') 
expected total cost is required to obtain an estimate for u (6) with e relative error and with 
probability at least 1 — 5. 

Remark 5 Similar to the case of controlling the second moment, we believe that the upper 
bound {l—1)/{2 — l) is optimal within a reasonable class of simulation algorithms. A heuristic 
argument will be given in Section\^ 



Finally, the proposed family of change of measures and analysis techniques are useful not 
only for Monte Carlo simulation purposes but also for asymptotic analysis. We provide the 
following approximation results which improve upon classical results in the literature such as 
Theorem |5J By appropriately tuning various parameters in our family we can approximate 
P{S G ■\ti, < 00) by Q (5* G ■) asymptotically as 6 00. We will explicitly indicate how to 
do so in later analysis. 



Theorem 5 Under either Assumptions A or Bl-3, there exists an explicit selection of rj^,, 
the Cj 's, a*, a**, p*, p** and the pj 's so that 

Iimb_^oo sup \P{S G A\Tb <oo)-Q{S e A)\=0. 

A 

The previous result is an immediate consequence of Lemma [10] combined with Theorem 
[HI It further shows that our mixture family is an appropriate vehicle to approximate the 
conditional distribution of the random walk given r?, < 00. Moreover, due to the convenience 
of the mixture form, as a corollary of the previous theorem and using a coupling technique, 
we can show, without much additional effort, the following theorem which further extends 
Theorem 1.1 in [B] by adding a central limit theorem correction term. This theorem is proven 
at the end of Section 16.21 



Theorem 6 Suppose that either Assumption A or Assumptions Bl-4 are in force. Let a 



2 



Var {Xi) < 00 and a (b) = P {X > u) du/P {X >b). Then 



aib) y ^ Jo<,^^ a{b) J 
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in R X D[0,1) X R. {B(t) : < t < 1} is a standard Brownian motion independent of 
(Yq.Yi). The joint law of Yq and Yi is defined as follows. First, P {Yq > yo.Yi > yi) = 

P {Yi > yo + yi) with Yq = Yi and 
• If Assumption A holds then 

P{Yi>t) 



• // Assumptions Bl-4 hold, then Yi follows exponential distribution with mean 1 and 
consequently Yq one? Yi are independent. 

3 Preliminaries: Heavy tails, importance sampling and 
Lyapunov inequalities 

3.1 Heavy tails 

A non-negative random variable Y is said to be heavy-tailed if E exp (^1^) = oo for every 
9 > Q. This class is too big to develop a satisfactory asymptotic theory of large deviations and 
therefore one often considers the subexponential distributions which are defined as follows. 

Definition 1 Let Yi, Yn he independent copies of a non-negative random variable Y . The 
distribution ofY (or Y itself) is said to be subexponential if and only if 

P(Fi + ... + y„>n) 
hm ; ^ = n. 

u^oo P [Y > U) 

Actually it is necessary and sufficient to verify the previous limit for n = 2 only. 

Examples of distributions that satisfy the subexponential property include Pareto dis- 
tribution, Lognormal distributions, WeibuU distributions, and so forth. A general random 
variable X is said to have a subexponential right tail if X+ is subexponential. In such a 
case, we simply say that X is subexponential. 

If X is subexponential, then X satisfies that P {X > x + h) / P [X > x) — 1 as x — oo 
for each h G (— oo, oo). A random variable with this property is said to possess a "long tail" . 
It turns out that there are long tailed random variables that do not satisfy the subexponential 
property (see [2T]). 

In order to verify the subexponential property in the context of random variables with 
a density function (as we shall assume here) one often takes advantage of the so-called 
cumulative hazard function. Indeed, a sufficient condition to guarantee subexponentiality 
due to Pitman is given next (see [21]). 

Proposition 1 A random variable X with concave cumulative hazard function A (■) and 
hazard function A (■) is subexponential if 

exp {x\ (x) — A (x)) dx < oo. 
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A distinctive feature of heavy-tailed random walks is that the rare event {sup„ S'„ > b} 
is asymptotically (as b — )■ oo) caused by a single large increment, while other increments 
behave like "regular" ones. Therefore, one can obtain the following approximation, often 
called fluid heuristic, for the probability u{b): 



u{b) = P{n <oo) = J2 P{n = k) (15) 

k=l 

°° 1 POO 

^J2P(^k>b-{k-l)fi)^ / P{X > s)ds. 

k=i ^ -^^ 

For notational convenience, we denote the integrated tail by 

/■oo 

G{x) = / P{X > s)ds. (16) 

J X 

The previous heuristic can actually be made rigorous under subexponential assumptions. 
This is the content of the Pakes-Veraberbeke theorem which we state next (see page 296 in 
0). 

Theorem 7 (Pakes-Veraberbeke) If F is long tailed (i.e. F{x + h)/F{x) — > 1 as x 
oo for every h > 0) and j^P {X > s) ds/EX~^ is subexponential (as a function oft) then 

n(6) = -(/i-i + o(l))G(6), (17) 

as 6 — J- oo. 

We close this subsection with a series of lemmas involving several properties which will 
be useful throughout the paper. The proofs of these results are given in Appendix 1X1 



Lemma 1 // B2 holds then \{x) = O (x'^o ^) as x oo. 

Lemma 2 Under Assumption B3 there exists a constant Hi (depending on a^) and bo, such 
that for all X <b — A~^(A(6) — a*) and b > bo, the integrated tail satisfies 

Gib- x) /Gib) < Ki. 

Lemma 3 Suppose Bl and B3 are in force. For each Eq > 0, there exists bo > such that 

eo'F{b) < Gib) < SobFib), 

for all b > 6o- In particular, Fib)/G (6) = o (1) as b — > oo. If Assumption A holds then for 
each 6o > we can select bo > sufficiently large so that 

i^6F(6)<G(6)<i±^6F(6). 

L — 1 L — 1 

for b > bo, where l is the tail index of F defined in Assumption A. 
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Lemma 4 Suppose B2 holds, for all x > bo and y > we have 

A(x) ^ / X 
A(x + y) ~ \x + y 

Lemma 5 Suppose B2 is satisfied. Then, we can choose 6o > sufficiently large such that 

X- A"^(A(x) - a.) > 

for all X > Bq. 

The following lemma allows us to conclude that the Pakes-Veraberbeke theorem is appli- 
cable in our setting. 

Lemma 6 Under either Assumption A or Bl-3, both F{x) and P (X > s) ds/ (EX^) are 
subexponential as a function of x. 

3.2 State-dependent importance sampling for the first passage 
time random walk problem and Lyapunov inequalities 

Consider two probability measures P and Q on a given space X with cr-algebra J-". If the 
Radon-Nikodym derivative ^i^) is well defined on the set A G J', then 

r dP 

P{A) = j —{u)lA{uj)Q{du). 

We say that the random variable ^(a;)/^ (w) is the importance sampling estimator associ- 
ated to the change of measure / importance sampling distribution Q. If one chooses Q' such 
that for each B ^ 

Q'iB)=PiBnA)/PiA), 

then, ^ = -P(^) almost surely on the set A and therefore the estimator (u) has zero 
variance. This implies that the best importance sampling distribution (with zero variance 
for estimating P{A)) is the conditional distribution given the event A occurs. 

Certainly, this zero variance estimator is not implementable in practice, because the 
Radon-Nikodym derivative involves precisely computing P{A), which is the quantity to com- 
pute. Nevertheless, it provides a general guideline on how to construct efficient importance 
sampling estimators: try to mimic the conditional distribution given the event of interest. 

In the context of this paper, we consider a random walk [Sn : n > 0) with 5*0 = and 
therefore 

P{Xn+iedx\S,,...,Sn)=F{dx). 
A state-dependent importance sampling distribution Q is such that 

g(X„+i e dx\S,, 5„) = rsl{x)F{dx), (18) 

where, the function (r^ (x) : s, x G -R) is non-negative and it satisfies 

/oo 
r-s^{x)F{dx) = 1. 
-oo 
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Now, consider the stopping time Tb = inf{n > : Sn > b} and set Ai, = {r^, < oo}, then 
it follows easily that 

P{A,) = ^^IA,f[rsUS^- S..^) 

Notational convention: throughout the paper we shall use Ef (■) to denote the expec- 
tation operator induced by (flSj) assuming that So = s. We simply write E^ (■) whenever 
So = 0. 

We will work with the specific parametric selection of rs{x) introduced in Section 2. In 
proving some of our main results we will be interested in finding an upper bound for the 
second moment of our estimator under E^ (■), namely 



E"^ YA]lrl_,{S. - j = E ^AllrsUS^ - S 
In general, the (1 + 7)-th moment (7 > 0) of our estimator satisfies 



E\lAllrs,_,{S,-S, 



i=l 



The next lemma provides the mechanism that we shall use to obtain upper bounds for these 
quantities. The proof can be found in [TO] . 



Lemma 7 Assume that there exists a non-negative function (7 : R — t- M'^, such that for all 
s <h, 

g{s)>E{g{s + X)r,{Xy), 

where X is a random variable with density /(■) and suppose that for all s > b, g{s) > e. 
Then, 



n 



giO) > sE I IAllrsUS^ - S,^,r . (19) 



1=1 



Most of the time we will work with 7 = 1 (i.e. we concentrate on the second moment). 
The inequality (HM is said to be a Lyapunov inequality. The function g is called a Lyapunov 
function. Lemma [7] provides a handy tool to derive an upper bound of the second moment 
of the importance sampling estimator. However, the lemma does not provide a recipe on 
how to construct a suitable Lyapunov function. We will discuss the intuition behind the 
construction of our Lyapunov function in future sections. 

If rs{x) has been chosen in such a way that the second moment of the importance sampling 
estimator can be suitably controlled by an appropriate selection of a Lyapunov function g, 
we still need to make sure that the cost per replication (i.e. E'^Tb) is suitably controlled as 
well. The next lemma, which follows exactly the same steps as in the first part of the proof 
in Theorem 11.3.4 of [23], establishes a Lyapunov criterion required to control the behavior 
of E^n. 
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Lemma 8 Suppose that one can find a non-negative function h{-) and a constant p > so 
that 

Ef{h{s + X)) < h{s)-p, 
for s < b. Then, E'^{Tb\So = s) < h{s)/p for s <b. 

Most of the results discussed in Section 2 of the paper involve constructing suitable 
selections of Lyapunov functions g and h appearing in the previous lemmas. The construction 
of these functions is given in subsequent sections. 

4 Lyapunov function for variance control 

Our approach to designing efficient importance sampling estimators consist of three steps: 

1. Propose a family of change of measures suitably parameterized. 

2. Propose candidates of Lyapunov functions using fluid heuristics and also depending on 
appropriate parameters. 

3. Verify the Lyapunov inequality by choosing appropriate parameters for the change of 
measure and the Lyapunov function. 

Our family has been introduced in Section 2. This corresponds to the flrst step. The 
second and third steps are done simultaneously. We will choose the parameters t]^,, the Cj's, 
a*, a**, p*, p** and the pj's of our change of measure in order to satisfy an appropriate 
Lyapunov function for variance control by means of Lemma [71 Some of the parameters, in 
particular the c/s, can be set in advance without resorting to the appropriate Lyapunov 
function. The key element is given in the next lemma, whose proof is given in the appendix. 

Lemma 9 Fix Pq G (0, 1) and select ai > sufficiently small such that for every x G [0, ai] 
2 — 2(1 — x)^° — x^" < 0. Then, there exists a2 > and a sequence, < ai < 02 < ■ ■ ■ < 
afc_i < 1 such that a^+i — aj < cti/2 for each 1 < j < k — 2, 

af + (1 - aj+if° > 1 + as. 

and Cfc-i > 1 — o"i, ai < Ci. 

Given /3o in Assumption B2, from now on, we choose 

Co = 6 - s - A"^(A(6 - s) - a*), = A"-^(A(6 - s) - a**), Cj = aj{b-s), (20) 

for j = 1, k — 1, with cti chosen small enough and aj = aj^i + ai/2 according to the 
previous lemma. 

We continue with the second step of our program. We concentrate on bounding the 
second moment and discuss the case of (1 + 7)-th moment later. The value of the Lyapunov 
function at the origin, namely, g (0) in Lemma [7] serves as the upper bound of the second 
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moment of the importance sampling estimator. In order to prove strong efficiency, we aim 
to show that there exists a constant c < oo such that 

E'^Zl < cu\b), 

where 

Zb = I in < oo)l[rs,_,{S, - S,.i) (21) 

i=l 

is the estimator of u{b). Therefore, a useful Lyapunov function for proving strong efficiency 
must satisfy that 

^?(0) < cu\b). 

It is natural to consider using an approximation of u^{b — s) as the candidate. Exactly the 
same type of fluid heuristic analysis that we used in ( TT^ suggests 

g{s) = mm{KG^{b- s),l}, (22) 

where G is the integrated tail defined in (1161) and k is a non-negative tuning parameter which 
will be determined later. 

It is important to keep in mind that g{s) certainly depends on b. For notational simplicity, 
we omit the parameter b. The function g (s) will also dictate when we are close enough to 
the boundary level b where importance sampling is not required. In particular, using our 
notation in (ITOl) and (fT8|) we propose choosing 77^, = G^^ (k"^/^) which amounts to choosing 

pj{x < Co) p^Jjx > Ck) sr^ Pjljx E (cj_i,Cj]) f{b- s- x)pkl{x g (cfc_i, c^]) 

P{X < Co) P(X > Ck) jr[ P{X e (c,_i, c,]) f{x)P{X eib-s-Ck,b-s- Ck-i])^ 

xligis)<l) + ligis) = l). 

Now we proceed to the last step - the verification of the Lyapunov inequality. The 
Lyapunov inequality in Lemma [7] is equivalent to 

Eir,iX)gis + X)) 

The interesting part of the analysis is the case g (s) < 1 because whenever g (s) = 1 the 
inequality is trivially satisfied given that < g {s + X) < 1. Hereafter, we will focus on the 
case that g (s) < 1. 
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The left hand side of ( 123|) can be decomposed into the following pieces, 
E{rs{X)g{s + X)) P{X <h-s- A~\K{h - s) - a,)) 



X E (^^^(^; X<b-s- A~\Aib -s)- a,)^ 



p 



P(fe _ , - X e (cfc_i, c,]) ^ / 9is + X)f{X) ^ ^ ^ 



Pk \g{s)f{b - s - X) 

We adopt the following notation 



J, = P(X <b-s- A~\A{b -s)- a,))E ( X<b-s- A'\A{b - s) - a,) ) 

V 9[s) J 



(24) 
(25) 



J„ = p{X > A-\A{b - s) - a,,))E f X > A-\A{b - s) - a„) 

J, = P{X e (q_i, c,])E (^^^^^; ^ e (c,_i, c,]^ , for z = 1, A; - 1 (26) 

J, = P(6 - s - X G (cfc_i, c,])E f f wl/^^^^tv ^ e (c,_i, c,]) , (27) 

\9{s)f{b - s - X) J 

so that inequality f l2^ is equivalent to showing that 

P* P** ~^ Pk ~ ' 

We shall study each of these terms separately. 

At this point it is useful to provide a summary of all the relevant constants and parameters 
introduced so far: 

• i > 1 is the regularly varying index under Assumption A. 

• 6o > is introduced in Assumption B, Lemmas [3] and [5] to ensure regularity properties. 

• Po E (0, 1) is introduced in B2 to guarantee that the distribution considered is "heavier" 
than a Weibull distribution with shape parameter Pq 

are introduced to define the mixture components corresponding to a "reg- 
ular jump" and a "large jump" respectively. 
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• Oi < ... < afc_i are defined according to Lemma [H 

• Cj for j = 0, 1, /c are defined in (!20!) and correspond to the end points of tlie support 
of the interpolating mixture components. 

• Tjjf: are parameters for the Lyapunov function. They are basically equivalent since 
rf^ = K appears in the definition of the Lyapunov function. It is important 
to keep in mind that by letting k be large, the condition g {s) < 1 imphes that h—s>ri^ 
is large. 

• Sq^Sq are arbitrarily small constants introduced in Lemma |3l 

• The parameters p*, p^^ and Pi for i = l,...,k are the mixture probabilities and will 
depend on the current state s. 

Other critical constants which will be introduced in the sequel concerning the analysis of 

fc are: 

• 5q > is a small parameter which appears in the analysis of J*. It will be introduced 
in Proposition [2j 

• 61 > 0, a small parameter, appears in the definition of pi and the overall contribution 
of the Jj's. It will be introduced in step III) of the parameter selection process. 

• (^2 > is introduced to control the termination time of the algorithm. It ultimately 
provides a link between a** > and (5o > in Section [5l 

• Parameters 9, e and ii which are introduced to specify the probabilities p** and the 
Pi's respectively. Their specific values depending on 6q and 61 will be indicated in steps 
I) to IV) below. 



Throughout the rest of the paper we shall use e,6 > to denote arbitrarily small positive 
constants whose values might even change from line to line. Similarly, K,c & (0, 00) are used 
to denote positive constants that will be employed as generic upper bounds. 

Now, we study the terms J*, J**, and Ji, i = 1, ...,k. 
The term J^,*: 

J„ = P(X > A^\A{b - s) - a„))E (^^^^^; ^ > A~'(A(6 - s) - a„)) 

^ P\X > A-\Aib - s) - a^^)) ,a^^ F\h-s) 
9{s) g{s) 
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A bound for J^: 

Proposition 2 Suppose the distribution function F satisfies Assumption A or Assumptions 
Bl-3. Then, ash — s ^ oo, 



E f ^^^^T^; X<h-s- A~\A{b - s) - a,)] < 1 + (1 + o(l))/i 



dgjs) 
9 is)- 

Therefore, for any 6q > 0, we can select r/* > such that for all b — s > rj^, , 

dg{s) 



E f X<b-s- K-\K{b -s)- a,)] < 1 + - (5*) 

V 9[s) J 



9 is) 



Proof of Proposition [2l By Taylor's expansion, 

9{s + X) _^ ^ ^^ dg{s + i) 
9{s) 9{s) 
where ( e (0, X) (or {X, 0)). For all s and X such that g{s) < 1 and g{s + X) < 1, 

2,. ^,_,^Fib-s-OG{b-s-OF{b-s) 



Xdg{s+0/9{s) = 2XF{b-s-0G{b-s~0/G'{b-s) = 2X 



F{b-s) G{b-s) G{b-s)' 
Then, 

Y^^E {Xdg{s + 0/g{s); X<b-s- A-\A{b - s) - a.)) 
Note the following facts, 

nb-s-0 

F{b-s) - ' 

and by Lemma [2] (Assumption B) or the regularly variation property of G (Assumption A), 

G{b-s-0 ^ 
Gib-s) 

and by Lemma [3] and the fact that F is subexponential (Lemma |6]), 

Fib-s-QGib-s-O 
F{b-s) G{b-s) ' 

as 6 — s — oo. By the dominated convergence theorem, 

hm fl!''\ E {Xdg{s + 0/g{s); X<b-s- A-\A{b - s) - a,)) = 2/i. (29) 

Therefore, we can always choose the constants appropriately such that the conclusion of the 
proposition holds. ■ 

As remarked in equation ( ITTl) . the terms Ji, i = 1, k, do not appear in the context of 
Assumption A. We consider them in the context of Assumption B. 
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Bound for Jj, 2 < i < k — 1: 

Proposition 3 Suppose that Assumptions Bl-3 hold. Then, for each 2<i<k — 1, we 
have that for any a > 

as 6 — s — 7- oo . 

Proof. Thanks to Lemma HJ for each x, y, z sufficiently large, we have 

X \* / y 



A{x)+Aiy)-A{x + y + z)>A{x + y + z)\ — — + , " -1 . (30) 

' \x + y + z J \x + y + z/ ' 

We ffist note that by repeatedly using results in Lemma [3] 

"J f{x)g{s + x) 

-f[x)dx 



^Kf^{x)G\h-s) 

G'^ip — s — x)f{x)dx 



A(b-s)-A(cj_i) r-c, 

< 1^1^ ^ / G^(h - s - a;)A(x)e-^(")rfx 

A(fe-s)-A(cj_i) 

< 4 F{c,-i){h - sfF\h - s - c,) 

= ^4^5 _ g)2g2A(b-s)-2A(cj_i)-2A(fe-.-c,) 

< etib - sf exp {-2A(6 - s) (a* ^ + (1 - a^f" " l) } = " ^)"" 



as 6 — s — oo for each a > 0. The last inequality is thanks to fl20l) . fl30l) . The last 
step (equality) follows from Lemma [3 and Assumption Bl which implies that the tail of X 
decreases faster than any polynomial. ■ 

A bound for Jii 

Proposition 4 Suppose that Assumptions Bl-3 hold. Then, for each a > we have 
J r' f{x)gis + x) 

Ji= f{x)dx = o{{b-s) ), 

Jb-s-A'^{A{b-s)-at) Jl[^)9[S) 

as b — s OO. 
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Proof of Proposition (4], Use Lemma [3] and lim2,_5.oo A(x) = and obtain 
f-^i f{x)g{s + x) 



6-s-A-i(A(b-s)-a,) tifl{x)G'^{b — s) 



f{x)dx 



G^b - s 

< e^oib - sfP{X >b-s- A-\A{b - s) - a,)) f 
Also note that by Lemma HJ 



6-s-A-i{A{6-s)-a.) 

^2A{b-s)-2A(b-s-x)-A{x)^^_ 

s-A-i(A(6--s)-a,) 



X fb-s-x^^' 



K{x) + K{b - s - x) - K{b - s) > K{b - s) + ) -1), (31) 

and, 

A(6 - s) -A{b-s-x)< A{b - s)(l - (1 - x/{b - s) f'). 

Therefore, for all x G [b — s — A~^(A(6 — s) — a*), ai{b — s)], with ai selected according to 
Lemma [HI 

X \ * x'^° 



2A(6 - .) - 2A(6 -s-x)- A{x) < A(6 - .) | 2 - 2 (^1 - ^ J - j^—^ | < 0. 

Together with Lemma [5l P{X > b — s — A~^(A(6 — s) — a*)) decreases to zero faster than 
any polynomial rate. The conclusion of the lemma follows. ■ 

A bound for J^: 

Proposition 5 // Assumption B holds then for each a > 

p f{x)g{s + x) 
icfc_i fk{x)g{s) 

as 6 — s — 7- oo . 

Proof of Proposition [H Note that 
g{s + x) p{x) 



^ ^G^(b-s)Mx 



-dx 



P{X e{b- s-Ck,b- s- Cfe„i]) / - 



g[s + x) p{x) 
G\b-s)f{b-s-x) ^ 



<elF{b-S-Cu) r ^)'^'(^) 2A(.-.)-2A(.)-A(6-.-.)^^_ 

' ^ A(o — s — x) 



Cfc- 



We note that ai is small enough and x > (1 — cri)(6 — s) so that we can apply Lemma [9] to 
conclude 

X V' fb-s-x^^'^ 



2K{b-s)-2k{x)-K{b-s-x)<k{b-s)\^-2\^- — -J - [—^ — —] 1 < 0- 
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By Assumption Bl, 1/A(x) grows at most linearly in x and also we have (just as in Lemma 
E]) that F{b — s — Cfc) < F{(b — s) decays faster than any polynomial rate. We then have 
the conclusion of the proposition. ■ 



Summary of estimates and implications for the design of the change of measure 
selection. The previous bounds on J*, J^,*, and Jj, i = 1, ...,k imply that we can choose 
parameters and setup the algorithm as follows. 

I If Assumption A holds, we choose a* and a** such that (fT4l) holds. If Assumption B 
holds, given a*, a** > 0, ui > 0, and aj = aj-i + o"i/2, chosen according to Lemma [9l 
let 

Co = 6 - s - A"^(A(6 - s) - a*), Ck = A"^(A(6 - s) - a„), 
Cj = aj{b — s) for j = 1, k — 1. 

II Select 6q G (0, 1/4) and let r/^, > be large enough so that if 6 — s > r/^, then 

p* p* p* g{s) 

III Choose SI e (0, - + 6^y^y{k + l)^) such that ii b - s > r], for t], large 

enough 

' dgjs Y 
9is) 

for all i = 1, k. Note that the Jj terms are all zero for the regularly varying case. 



Ji < 515q 



The choice in III) is feasible because dg (s) / g (s) = 2F{b — s)/G{b — s) decreases at 
most a polynomial rate and Ji terms derived in Propositions [3l HI and [5] are smaller than any 
polynomial rate. Both II) and III) can be satisfied simultaneously by choosing ry* sufficiently 
large. Now, with the selections in II) and III) we have that 

J if: ^ J ^if: ^ J }^ 

P** Pk 



p* p* g[s) 

Now we must select p*, p*^, and the p^'s so that fl33|) is less than unity in order to satisfy fl23|) . 
Recall that p** represents the mixture probability associated to the occurrence of the rare 
event in the next step. Therefore, it makes sense to select p*^, of order Q{F{b — s)/G {b — s)) 
as 6 — s — )■ 00. Motivated by this observation and given the analytical form of the equation 
above we write 

p„ = min{edg{s)/g{s),e\ = min{2^F(6 - s)/G{b - s),e} (34) 
for some 6,e > (the precise values of 6 and e will be given momentarily) and let 

Pi = £iP** (35) 



25 



for each i = for some e i > small enough to be defined shortly. This selection 

of Pi's also makes intuitive sense because the corresponding mixture terms will give rise to 
increments that are large, yet not large enough to reach the level b of the random walk and 
therefore they correspond to "rogue paths" - as we called them in the Introduction. In 
addition, one can always choose 77* large enough such that p** < e for all 6 — s > 77*. Given 
these selections we obtain 

= 1 - p^^ - ksip^^. (36) 
We then conclude that if + < 51/2 < 1/4 and e < 51/2, then 



P* p** ^ Pk 

1=1 



< 1 + j9,, (1 + ks,) (1 - 5;)-' + ^ ^ °V p*. + e'"'"^ + k5l5, 



^ I ^2a.. P** I , » » P** 



ke,) (1 - 5ir' + ^-r^/i +-r7I^ + k 



9 ^ 4:9^K e^ei 



Now choose ei = (5q/(A; + 1) and then select 9 = —p{l — 5q)/{1 + 5q)^. Then we note that our 
selection of 51 guarantees 51 < 9'^eik~^ . Finally it is required that k > e^"**/[4^^5o]- Note 
that the selection of (5q,5* > requires that 6 — s > 77* for 77* > sufficiently large, which 
is guaranteed whenever g {s) < 1 and k is sufficiently large. So, the selection of n might 
possibly need to be increased in order to satisfy all the constraints. All this selections in 
place yield (using the fact that 5^ < 1/4) 

I±+-I^ + Y,-<l+P**{il + 5lf - (1 + 5lf + 25*) < l+p,Jl {51 - 1) < 1. 

P^ P*=¥ ^ Pk 

The various parameter selections based on the previous discussion are summarized next. 

IV Select El = 51/ {k + 1), e = {5lY (this guarantees p^,^,(l + ksi) < 5^/2) and 9 = 
-/i(l - 5q)/{1 + 5^)^. Set p**. Pi for i = 1, k and according to ([34]), ([35]) and (l36l) 
respectively. Then, choose k large enough so that k, > e^'^** /[A9'^5q] and at the same 
time g (s) < 1 implies b — s > t]^, with 7/* also appearing in II) above. 

We now can provide a precise description of the importance sampling scheme. Assume 
that the selection procedure indicated from I) to IV) above has been performed and let 
5*0 = 0. Suppose that the current position at time k, namely Sk, is equal to s and that 
Tf, > k. We simulate the increment V^+i according to the following law. If g (s) < 1 then we 
sample V^+i with the mixture density in IQ. Otherwise, if g (s) = 1 we sample V^+i with 
density / (■)• The corresponding importance sampling estimator is precisely 

Zh = I{n < oo)l[rs,_,{S, - S,^i). (37) 

i=l 

Note that we have not discussed the termination of the algorithm - the expected value of 
under the proposed importance sampling distribution. Indeed, this is an issue that will be 
studied in the next section. Here we are only interested in the variance analysis of Zi,. 
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Proof of Theorem [T]. We must show that the estimator Zi, defined in (1371) is strongly 
efficient for estimating u{h). Our discussion summarized in the selection process from I) to 
IV) above indicates that g{-) is a valid Lyapunov function. Therefore we have that 



Hence, according to flT7|) . 



sup ^TTT < OO. 

b>l M^(6) 



5 Controlling the expected termination time 

As mentioned previously, if is a strongly efficient estimator for in order to compute 
u{h) with e relative error with at least 1 — 5 probability, one needs to generate 0{e~'^5~^) 
(uniformly in h) i.i.d. copies of Zi,. The concept of strong efficiency by itself does not 
capture the complexity of generating a single replication of Zi,. In this section we will 
further investigate the computational cost of generating Z^. We shall assume that sampling 
from the densities gs(-) or / (■) takes at most a given constant computational cost, so the 
analysis reduces to finding a suitable upper bound for E^ti^. 

We first assume that F is a regularly varying distribution. We will see that if I) to IV) 
and also V) below are satisfied then the expected termination time is O (b). The key message 
is that we can always select a**,(5o > sufficiently small in order to satisfy both Lyapunov 
functions in Lemmas [7] and 

V If Assumption A holds, let 77* be large enough so that if g {s) < 1 (i.e. h — s > r)^ = 
(av-V2)) then 

F{b-s) ^ {t -l){l-6*) 
G{b-s) - b-s 

We also have that a**, > are sufficiently close to zero such that 

A* = 2{l - i)il_ziol!e-'^" _ 1 _ 2(1 - 6-2-^"/') (i - 1) > 

with i > 1.5. 

Proposition 6 Suppose that Assumption A holds and l > 1.5. Then, the selection indicated 
in I) to V) yields both Theorem [I] and 

E^in) < po + pib, 

for Pq, pi G (0,00) independent ofb. 

Proof of Proposition [61. We will use Lemma |8] to finish the proof. We propose 

h{s) = [p + b- s]I{s < b), 
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for some p > 0. First we note that 

E'^{b-s-X;X e {A'^ (A {b - s) - a„) , 6 - s]) (38) 

= P" ''"'prit"A'-^.fA7/'7"''V'" -E(i^s-X\Xe (A- (A (t-s)- a„) 
F [X > A ^ [A [b - s) - a^:^:)) 

Recall that 

p,. = mm{2eF{b - s)/G{b - s), ?} = Minll (i + « (1)) (39) 

b — s 

as 6 — s oo, where 9 = — /i(l — (5o)/(l + (5q)^. Therefore, we can select ?7* > large enough 
so that if 6 — s > ?7* 

2/i(^-l)(l-^o)^ ^ ^ 2/i(^-l) 
(6-.)(l + 5S)^ (b-s) ■ 

Now, note that r/* can be chosen sufficiently large so that if a = e"^"^**/', then 

/ . /, N . , /, P (X > b — s) 

exp (-A {b-s)+A{a{b- s))) = p^^^^^^.^^) < exp (-a,,) 

as long as 6 — s > ?7,,. Therefore, 

X> A-i(A {b-s)-a,,) 

implies X > a{b — s) and we have that 

E{b-s- X\X e (A^^ (A ib-s)- a**) , 6 - s]) < (1 - a) (6 - s) . (40) 

Together with ([38]) , (139]) , and (gO]) , if 6 - s > we obtain 

E'^ib -s-X-Xe (A-i (A (6 - s) - a,,) ,b-s])< 2|p| (1 - a) (i - 1). 

The previous estimates imply that by choosing r/* > large enough we can guarantee that 
for all 6 — s > ?7* we have 

E'^ihis + X)) 

= E^{p + b - s - X; s + X < b) 

< (1 - Q{X > b - s)){p + b-s-i2 + o(l)) + 2|/i|(l - a){i - 1) 
= (1 - p**e-"")(/i(s) - /i + o(l)) + 2|/i|(l - a){L - 1). 

By noting that < \fi\, if b — s > rj^, and r/^, is selected large enough we obtain that 

E«(/i(s + X)) 

< h{s) - p - p.^e-^-h (s) + 2|/i| (1 - a) (i - 1) + o(l) 

2u(i-l)(l-5*)2 „ ^ ^ ^ ^ . ^ 
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The above inequahty holds for all p > provided that h — s > rj^ = G^^ (k"^/^) so that 
6 — s > ?7* if and only ii g (s) < 1. Since l > 1.5, one can choose a** and 6q sufficiently small 
such that ^ 

we conclude that 

E^{h{s + X)) <h{s)+fxS; 

as long as g (s) < 1. Now, if g {s) = 1 (i.e. if < 6 — s < 77^,) we do not apply the change of 
measure and therefore 

E^{h{s + X))=E[p + b-s-X;X <b-s] 

< h{s) - E{X\X < 0) - pP (X > T],) . 

Given the selection of k (and therefore of 77* = we can choose p large such that 

-E{X\X < 0) - pP (X > 7],) < fi6; < 0. 

Hence, 

E^n < h{0)/\fi\S;. 
Thereby, the conclusion of Lemma [8] follows by redefining the constants. ■ 



Remark 6 The previous result concerning the condition l > 1.5 raises a couple of natural 
questions. First, what is special about a tail index t = 1.5? What would be required in order 
to obtain both strong efficiency and E^Ti, = O {b) assuming only l > 1? We believe that 
the previous result is basically optimal. We do not pursue this claim with full rigor here but 
provide an argument showing why we expect this to be the case. First, Theorem \E implies 
the approximation 

P {b6n <n< b6{n + l)b\n < 00) = [P (Fq > S\fi\n{L - 1))-P (Fq > S\fi\{n + 1))](1+o (1)) 

as b 00 for any 5 > 0. Even if we could apply importance sampling directly to Tb (rather 
than doing it through the Xj 's) it would be reasonable to select Q (■) so that 

Q {b6n <n< b6{n + 1)) = ci (6) n-^'{l + (1)) 

as b 00. Since we wish to have E^Tb < 00 we should impose the constraint 71 > 2. Now, 
we have that 

P {Yo > 6\fi\n{L - 1)) - P (lo > S\fi\{n + 1)) = S\fi\{i - 1) (1 + S\fi\n)-' (1 + o (1)) 

as n 00. On the other hand, strong efficiency imposes the constraint that 

g ( Qib6n<n<b6in\l)) ) ^ ^^^^ < ^ ^^^^ ^ < ^ ^''^ 
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which suggests 

oo 

^n-2^+T^<oo. (42) 

71=1 

Consequently, we also must have 2i > 71 + 1. Combined with the previous constraint (i.e. 
7i > 2), it yields l > 3/2. 

We will show that if i > 1 we can control 1 + 7 relative moments (for 7 small enough) and 
still keep E'^ti, = 0{b). However, before we do so, in order to complete the argument for 
the proof of Theorem [3] we will continue working with 7 = 1 in the context of Assumption 
B. 

Proposition 7 If Assumptions Bl-3 hold, we assume there exists 5 > and /3 G [0, /3o] such 
that \{x) > 6x^~^ for x sufficiently large. Then, there exist a*, a*^,, p^,, p^,^, pj, j = 1, k, 
such that Theorem U\ holds and, in addition, 

for Pq and pi sufficiently large. 

Proof of Proposition [T]. Let /3 G (0, /3o) and consider the Lyapunov function, 

h{s) = [p+ib-sY-^]Iis<b). 

For all 5 > 0, 

E'^{h{s + X)) 

<q(x < {l-e){b-s)^E^(^p+{b-s-XY^^\X < (l-£)(6-s)) 
+ (p + e'-^ib - sf-^)Q ((l^s)ib-s)<X<b~s). 
With Assumptions Bl-3, if /3 = 0, using L'Hopital rule on a subsequence, we have 

xFix) —F{x) + xX{x)F{x) 
lim^^_— — — = lim^^_ = 00; 



G{x) — F{x) 
if/3G(0,/3o), 

= lim,^oo^'-'A(a;) - (1 - P)x-^ > S. 

There exists e,6' > small enough and 77* sufficiently large such that for all 6 — s > r/* and 
all p > 0, 

E'^ihis + X)) 

< (1 _ 2e5ib - sf-') (p + (6 - sY~^ - (1 + (50(1 -(3)ib- s)^V) 
+ 2e6{p + e'^^ib - s)i-'^)(6 - sf~' 

< (1 _ 2e5ib - sf-') (^h{s) - (1 + 5'){1 -(3){b- s)-^/i) 

+ 2e6{p + e^-^{b - sy-^){b - sf-^ 

< h{s) - 65. 
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The above derivation is true for all /3 > satisfying conditions in the proposition. When 
/3 = due to Assumption Bl, one can always choose 6 large such that 266 > 3|/i|. This 
allows us to control the contribution of the term {l + 6'){l—f3){b — s)~^fi in the above display. 
Therefore, this derivation is true for all f3 G [0,/3o]- 

On the other hand, if — s < r/* and we select 77* = G^^{k~^^'^) so that g (s) < 1 if and 
only if 6 — s > ?7^,, we obtain that 

E'^h{s + X) 

= Eh{s + X)<p+{h- sf~^ -pP{X> r],) + E{{b - s - X)^'^ - {b - s)^^^ -X <b-s). 
Clearly, once 77,, has been selected we can pick p large enough so that 

-pP{X>r],)+ sup E{{b- s - X)^-'^ - {b- s)^-'^ ■,X <b- s) < -6/2. 

0<b~s<rit 

Therefore, 

E^{h{s + X)) < h{s)-6/2 
and we conclude the result by applying Lemma [HI ■ 

Proof of Theorem [3l The conclusion follows immediately from Propositions |6] and [71 ■ 

Finally, we come back to the problem of controlling (1 + 7)-th moments in order to 
guarantee E'^Tf, = O (6) when F is regularly varying with l > 1. This corresponds to 
Theorem [H The next proposition is central to the proof. 

Proposition 8 Suppose that Assumption A holds and that l G (1, 1.5]. Then, we can choose 
a*, a*^,, p^, and p*^,, such that for each 7 G (0, (i — l)/(2 — l)) there exists a K > 0, 

EQz'^^ < Ku {bf^^ 

and E'^Th = O (b) as 6 — )■ 00. 

Remark 7 With a very similar argument as in Remark {Si we believe that the bound 1 + 
(i — l)/(2 — l) is the highest moment that one can control while maintaining 0{b) expected 
termination time. An analogous constraint to ( l42l) is that 

00 

^^-(l+7)(^-7i)-7i < 
n=l 

This implies that 7 < — l)/(7i — < (' ~ l)/(2 — '•)■ Note that it is necessary to impose 
7i > 2 to have 0{b) expected termination time. 

Proof of Proposition [8l The strategy is completely analogous to the case of 7 = 1. We 
define 

g^ (s) = min{KG {b — sY~^^ , 1}. 

We need to verify the Lyapunov inequality only on g^ (s) < 1 (as before the case g-y (s) = 1 
is automatic). We select 

p„ = mm{6dgy (s) / g^ (s) , e} 
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for e sufficiently small. Applying Lemma [7] we need to show that 



^ + ^<1, (43) 



where J* and J** are redefined as 

J, = P{X <b-s- A-\A{b -s)- a,,)yE (^^I^±^; X<b-s- A-\A{b - s) - a,)^ 

J„ = p{X > A-\A{b -s)- a.,)yE ( X > A-\A{b - s) - a,,)] . 

Note that the Ji terms analogous to fl26l) and fl271) are all zero. At the same time, we need 
to make sure that we can find p > such that if 

h (s) = [p+{b- s)] l{b-s>0) 

then 

E'^h {s + X) <h{s) - e (44) 

for some e > if 6 > s. 

Inequality (H3|) can be obtained following the same steps as we did in I) to IV) in the 
previous section. First we note that if r/* = G'^ {k~^/^^+^^) is large enough (or equivalently 
K is sufficiently large) 

J,. ^ P{X > A-i(A(6 - s) - a„))^+' _ e'^"(T+i)F {b - s) 



pj* ~ g{s)p2^ k{1 + 9^ G {b - s)' 

Also, for any 5 > we can ensure that if ?7* is large enough and if 6 — s > ?7* then 

g(l+7)(^-l)(l-^) ^ g(l+7)F(&-g) ^ g(l+7)(,-l)(l + ^) 

6-s Gib-s) - b-s 

and we also can ensure that 

J * 



(1 -p^^y 



< (1 + 7(1 + S)p,,)E (^^2i^±p.. x<b-s- A~\A{b -s)- a,)^ . 



A similar development to that of Proposition [2] yields that rj^ can be chosen so that if 
b - s > 7]^, 

E ( ^lli±^; X<b-s- A-(A(6 - .) - a.)] < 1 + Ml - S)^''^'^ 



Therefore, 



^7 (^) /V ^7 (' 
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and then 



+ 



G{b-s) J \ G{b 



K{l + -fpe^ G{b-s) 

We then can select 9 = (1 — 6Y/['j{l + 6)], a** < 6 and k sufficiently large such that the 
right hand side the above display is less than one. At the same time, the analysis required 
to enforce (jHj) is similar to that of Proposition [61 We, therefore, omit the details. The key 
fact is now that 

(l+7)M^-l)(l-^f ^ 
7 (5 - s) (1 + 5) - ^" 

and now we need to enforce 

(l±2Kii:M!e-".. - 1 - (1 H- ,,(1 -.)(.- 1) > 0, 

7(1 + d) 

where a = e~^'^**/\ This can always be done if we choose 7 < (i — l)/(2 — i) and 5, a** > 
sufficiently small. ■ 

Now we provide the proof of Theorem |H 
Proof of Theorem 14]. From the result in Proposition [8], the (1 + 7)-th moment of the 
estimator and E'^ti, is properly controlled. We need to bound the total computation time 
to achieve prescribed relative accuracy. Let Wi,W2,... be a sequence of non-negative i.i.d. 
random variables with unit mean and suppose that EWl^^ < for 7 > 0. Define = 
{W1 + W2 + ... + Wn)/n and note that 



P{\Rn -l\>e)<P\\Rn-l\> e.maxWi <n]+P\ maxVFi > n 



i<n 



I I + P I max 



Now using Chebyshev's inequality we have that 

K 



P max Wi> n] <nP iWi > n) < 
\ i<n J 

On the other hand, given maxj<„ Wi < n, Wi's are still i.i.d. and 

maxTy- < n 1< ^ ^^''^^ " + ^^^^ - ^ ^^'^^^^ " + ^^^^ 



P { \Rn M> £ ' ' ^^2p (yy. < 

The 0(1) term in the above display is in fact {E{Wi\Wi <n) - if. Then, we have that for 
7 6(0,1) 



E (W^I{Wi <n)) = 2E(^I {Wi < ^) ^ " 



< 2 / tP{Wi> t) dt < 2K / —dt = ^^n^-^ 
Jo Jo ^'^ 



2K 

t^~~" 1-7' 
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Therefore, for n sufficiently large we have that 

3K 



Thus, we have that 



P {\Rn-l\> e,maxWi <n \ < ■ , , . 



, ^ K AK 

P{\Rn-l\ >e) <- ^^ + —< 



(1 - 7) e^nT nT ~ (1 - 7) e^r^T " 

Applying these considerations to Wn = Zi,/u (b) and letting 4K/[{1 — 'y)e'^n'^] < 6 we obtain 
the conclusion of the theorem. ■ 

6 Approximation in total variation and conditional limit 
theorems 

6.1 Approximation of the random walk up to 

We will need the following lemma for the proof of approximation in total variation. 

Lemma 10 Let Qq and Qi be probability measures defined on the same a-field T such that 
dQi = M'^dQo for a positive r.v. M > 0. Suppose that for some e > 0, E^^ (AP) = 
EQom <l+e. Then, 

suplQi iA)-Qo (A)| <£i/l 

Proof of Lemma IIOI Note that 

IQi {A)-QoiA)\ = \EQ^{l-M;A)\ 

< E^' (|M - 1|) < [E'^'iM - 1)2]!/' = {E^'M^ - 1)'^' < e'/'. 

■ 

Also, it is not hard to verify that by letting P^^^(-) = P{-\Th < 00) we have 

dQ P{Tb < 00) ' 

Then, it is sufficient to show that for e arbitrarily small there exists b sufficiently large 
depending on e, 

E'^Zl < + 

Theorem 8 Suppose that Assumption A or B1-B3 hold. For any 5 > 0, there exists 77* > 
such that for all b > rj^, there exists a choice of p^, p**, pj, j = l,...,k such that the 
corresponding estimator satisfies, 

E'^Z^ <{l+e)u^{b). (45) 

Therefore, the importance sampling distribution converges in total variation to the conditional 
distribution of the random walk given {u < 00}, as 6 —)■ 00. 
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Proof of Theorem [81. Given e,e' > small, we consider k > and functions 

( \ - S 1 + 5e + Ks^+'' /b'^+'' , s > 
^^'')-\ l + 5£,s<0 

g{s) = mm{l,fi-^ j{s)G\b-s)}. 
Let ?7* = sup{6 — s : g{s) = 1}. We can easily see that r/* — j- oo as k — )■ oo. Also, 

l + 5e < 7(s) < K + l + 5e, 

for all s < b. We proceed with a similar development as in the previous section. We adopt 
the same notation as in (124]) . fl25|) . fl26|) . and ([27]). Since 7(s) is bounded, results as in 
Propositions [3], [H, and [5] still hold. In addition, we can choose a^,* small enough such that 

^ P\X>A~\A{b-s)-a.^)) , 

P**g{s) p**g{s) 

There is one last term, namely J*. Note that 

According to the proof of Proposition [2] (more specifically ( l29i) ). 

^ {^~G^W^' ^ < ^ - ^ - A-'(A(6 -s)- a,)^ < 1 + (2/i + o(l))F(6 - - s). 

as 6 — s — y oo. Now, we consider the term 

For all > s > ¥' and s + X > 0, 

Therefore, for b > s > ¥' , by dominated convergence, 

^''■')^ (^^W^ - l) ; A- < . - . - A-(A,. - „) - ^ .(1 H- 

as 6 — s — oo. For s <W\ 

= 0(6-1-^'+"") = o{F{b - - s)) 
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as 6 oo uniformly over s < ¥' . Consequently, it follows that 

E (^^^^^; x<b-s- A-\Aib -s)- a,)] < 1 + (2^ + o{l))F{b - s)/G{b - s), 

as 6 — s — 7- oo. We choose, 

p** = minje, —(1 — e)^F{b — s)/G{b — s)}, pj = e^p^^. 
To be consistent with the previous notations, we let 



Then, 



E 



g{s + X) 



rs{X) 



e 



< 1 + (1 - £ + 0(5))/i 



(46) 



G{b-s) 



o{l)ke-^F{b - s)/G{b -s)-{l+e] 



When s < 6/2, 



E 



g{s + x) 

9{s) 



rs{X) 



< 1 - (1 + o{e))^Ji^^ — 4 + (2/" + 



^{s)G{b - s){l - e)- 
F{b-s) 



G{b-s) 



G{b-s) 



- (1 + 3e) 



fiF{b - s) 
l{s)G{b-s) 



Because 7(5) > 1 + 5e, for b large enough, E 



Then 



E 



gjs + x) 



L{X) 



rs{X) < 1, when s < 6/2. For s > 6/2, 

7(5) > k/4. 



, F(b-s) , , ,,F(b-s) 

< 1 - (1 + o{e))fi-^ f + {2fi + 0(1))- ^ ' 



G{b-s) 



Gib - s) 



+ o{l)ke 



_^F{b-s) 4(1 + 35) /iF(6 - s) 



G{b-s) 



K G{b-s) 



g{s+x) 

9{s) 



rs{X) 



< 1 when 



For any s > one can always choose k large enough such that E 
s > 6/2 and g{s) < 1. Therefore, 

E'^L^ < g{0) = (1 + 5e)i2-^G{b)^, 

for 6 large enough. The conclusion then follows from Lemma [10] and Theorem [71 ■ 
Proof of Theorem [5l The conclusion is a direct application of Lemma [10] and Theorem [8] 



Here we emphasize that the choices of parameters of the mixture family in the current 
section are different from those in Section [5] Especially for the regularly varying case with l G 
(1.5,2), in order to have finite expected termination, we will have the importance sampling 
distribution deviate from the zero- variance change of measure. 
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6.2 Conditional central limit theorem 



The goal of this section is to provide a functional approximation to the joint distribution of 

{{n, Siurti, Sr.) - ue [0,1)}, 

conditional on {rb < 00} as 6 — ?■ 00. To make the discussion smooth, we postpone some 
technical proofs to Appendix [Bl 

For all the theorems so far, we assume either Assumption A or Assumptions B1-B3. In 
this section, in the setting of Assumption B, we will further impose Assumption B4. 

The approximation will be obtained based on a coupling of two processes governed ac- 
cording to a probability measure which shall be denoted by Q* . Our importance sampling 
distribution induces a process that behaves most of the time like a regular random walk, 
except that occasional large jumps occur with probability p**. We will couple this process 
with a regular random walk and argue that with high probability as 6 00 we have that Th 
coincides precisely with the first of such large jumps. 

We now proceed to formalize this intuition. Consider the process 5' = {S'„:?t,>0}, 
where 5^ = Xi + ... + X„, 5*0 = 0, and we have that 

Q*{Xn+i e dx\Sn = s) =qs (x) dx = r~^{x)f{x)dx. (47) 

The function r~^{x) is chosen to satisfy the conditions of Theorem [HI We shall slightly abuse 
notation by letting r^, = inf{n : Sn > b}. 

We further introduce a random walk S = {Sn '■ n > 1} such that Sn = Xi + ... + and 
with the property that the Xj's are i.i.d. under Q* and have density 

Q*{Xi G dx) = f{x)dx. (48) 

The joint law of S and S will be described next. 
We first define 

P(X<6-s-A-i(A(6-s)-a,))^ ^ ^ ' 

Note that by possibly increasing the selection of k and 77* = sup{6 — s : g{s) = 1} in Theorem 
[HI we can always guarantee that p (s) G [0, 1]. Actually p{s) — )■ 1 as 6 — s — ?■ 00. Next define 

qlix) = np{s) < 1) il-p{s))-\q,{x)-pis)f{x)). (50) 

The next lemma shows that g*(-) is a density function and provides a decomposition of (x) 
that will allow us to describe the joint law of S and S. The proof of the lemma is given in 
Appendix [Bl 

Lemma 11 Ifp (s) < 1 we have that q*{-) is a density function provided that n (and therefore 
Tj^) are chosen large enough. We thus have the mixture decomposition 

qs{x) = p{s)f{x) + (1 - p{s))ql{x). (51) 
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The processes 5* and S evolve jointly as follows under Q*. First simply let 5* evolve 
according to fl48l) . Now, at any given time n + 1 the evolution of S obeys the following rule. 
Given that Sn = s, Xn+i is constructed as follows. First, we sample a Bernoulli random 
variable to choose among /(•) and q*{-) according to the probabilities p{s) and 1 — p{s) 
respectively. If /(■) has been chosen, we let Xn+i = Xn+i- Otherwise, we construct Xn+i 
from the g'^(-) and X^+i from f{x) independently. We further let 

N, = mi{n >l:X^^ XJ, 

which is the first time that f{x) is not chosen. We intend to show that P{Ni, = t^) ^ 1 as 
6 — 7- oo. The result is summarized in the following lemmas and propositions whose proofs 
are given in Appendix iBl 

Lemma 12 

lim Q*{Nb < oo) = 1. 

b—^oo 

Lemma 13 Let e be chosen as in Theorem\^ There exists 6o > (depending on a^^, and e) 
and 7(a^,^<,£:) > such that 7(0**, 5) — )■ as a** — and e — )■ 0, satisfying that 

Q*{n = iVfc) > 1 - 7(a„,e), 

for all b > bo, where = mf{n > 1 : Sn > b} . 

Now, we are ready to present the result which uses S to approximate the process S up 
to time Tf,. 

Proposition 9 There exists a family of sets {Bh : b > 0) such that P{Bh) — J- 1 as 6 —t- 00 

and with the property that for all S ^ Bf, 

Q*{N, > ta{b)\S) = P{Ze > + o(l)), 

as b —> 00, where a{x) = G{x)/F{x) and 9 is defined in (H^ . 

• Under Assumption A, 

29(t-l) 

P{Ze >t)= (1 + 

for allt > 0. 

• Under Assumptions Bl-4, 

P{Ze >t) = e"M. 

Proof of Theorem [G]. Thanks to Theorem [HI the distribution of {Sn : 1 < n < Tt} under 
Q* converges in total variation to the distribution of {Sn : 1 < n < rj,} given < 00 under 
P. It is sufficient to show the limit theorem of {Sn : 1 < n < Tf,} under Q*. 
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Thanks to Proposition IH we are able to construct a random variable Zq following the 
distributions stated in Proposition IH] such that Zg is independent of S and 

- -- ^ 0, 



a(6) 



almost surely as 6 — )■ oo. Thanks to Lemma [T3| we have that 



a(6)' 



5 



0<i<l 



' a(6) 



a{b)' 



0<t<l 



a{b) 







in probability as 6 — )■ oo (in fact, the convergence holds for almost every S in the sequence 
Bh). Further, as 6 — )■ oo, we can let 9 — )■ — Ai/2. So it is possible to construct a random 
variable Yq independent of 5* and following distribution stated in the theorem such that 

Ze ^ Yo, 

almost surely as 6 — oo. Now, using a standard strong approximation result (see for instance 
[21]) we can (possibly by further enlarging the probability space) assume that 



Sit\ = fJ.t + aB (t) + e (t) 



(52) 



where e (■) is a (random) function such that 

e (xt) 

tV2 







with probability one uniformly on compact sets on a; > as t oo. Therefore, we have that 
StN, - tfiNb ^ aB {ta{b)YQ/\ii\ + ta{b%) + (ta(&)yo/|/i| + ta(&)eb) 

where — as 6 — > oo. For 5 arbitrarily small, we now verify that for each z > 5, 

B {ua{b)z + ua{b)E,b) — B {ua{b)z) 



sup 

0<M<1 



a{b)z 



0, 



as a(6) — oo. Given — )■ in probability, it suffices to bound the quantity 

B {ua{b)z) - B {sa{b)z) 



sup 

u,se{0,l),\u-s\<e/S 



\/a{b)z 

By the invariance principle the previous quantity equals in distribution to 



sup 

?i,se(0,l),|n-s|<£/<5 



\B{u) -Bis) 
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which is precisely the modulus of continuity of Brownian motion evaluated ejb. By conti- 
nuity of Brownian motion, its modulus of continuity goes to zero almost surely as e — > 0. 
Consequently, we obtain 



N, 



0. 



Because Yq is independent of S*, using the invariance principle for Brownian motion, we have 
that 



Yo_ j Sta(b)Yo/M + ta{b)Yo \ Sn, - b \ _^ ( Y^ 



Now, we figure out the joint distribution between Yq and Yi. Note that S'atj, — b satisfies 

Sn^ ~b _ XN{b) + 'S'Ar(;,)-i — b 

a{b) a(b) 

In turn, we have, 



S 



N{b)~l 



a{b) 

in probability. In addition, the conditional distribution of X7v(b) given 5'7v(fe)— 1 is asymptoti- 
cally (as b — )■ oo) that of X given that X > b—SN{b)-i and SN(b)-i, where X is a random vari- 
able with density / (■) independent of Siy(^b)-i- Therefore, the law of {XN(^b) + SN{b)-i—b)/a (b) 
given SN(b)-i can be approximated by that of X /a (b) —Yo—b/ a (b) given Yq and X—YQa (b) > 
b. 

In the setting of Assumptions B1-B4, we establish in the proof of Proposition [9] that 
a{b) = (1 + (1)) /X{b) as b oo. Because of Assumption Bl we have that a (6) = o (6). 
Because of Assumption B4 we have that for each > 

Q*(X > ya (b) + Yoa (b) + b\X > b + YqU (b) ,Yo) ^ P {Y^ > y) = exp (-y) (53) 

as b oo. Hence, Yi is an exponential random variable with expectation one and is inde- 
pendent of Yq. 

Now, suppose that Assumption A holds. We have that a (b) = b/ {l — 1) + o (6) as 6 oo. 
Therefore, 

Q*{X - {Yoa (b) + b)> ya (b) \X > Y^a (b) + 6, Fq = Vo) 

= (l + o(l))g* (^X -{Yo + L-l)a{b)>ya (b) \X > {Yo + t - l)a (b) ,Yo = yo 
^P{W>y/{yo + i-l)), 

where 

P{W >t) = (1 + 
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for t > 0. Now we need to verify that the law of {Yq, Yi) as stated in the theorem coincides 
with that of {Yq, W[Yq + — 1)]). First we note that the joint density of (Yq, Yi) is given by 

P (Yo > yo, Y^ > y,) = ^ (1 + (y, + y,)/ - . 



dyodyi l - 1 

Therefore, 

P{Y,edy,\Y, = y,) 

oc (i - 1 + ?/o + yi) 

dyi 

On the other hand, 

P{W[y, + {L-l)]edy^) , .^n-.-l_, . , , x-.-i 



6 (1 + yi/[i/o + - 1)]) ' ' oc - 1 + 2/0 + yi)" 



The independence between Bit) and (lo;^!) is straightforward. This concludes the proof of 
the theorem. ■ 



7 Implementation and examples 

We implemented the algorithm and compare the performance with other existing algorithms 
in literature. In particular, we investigated two cases: regularly varying distribution and 
WeibuU like distribution. 

Regularly varying distribution. We consider the increment has the following represen- 
tation. 

= ^1 — Til 

where Vi are i.i.d. with distribution that P(\^ > v) = (1 + v)'"^'^ for > and Tj's are 
i.i.d. exponential random variables with expectation 4/3. It is not hard to verify that 
EiX^ = —2/3. In fact, this corresponds to the tail probability of the steady-state waiting 
time of an M/G/1 queue. There are a few provably efficient algorithms in literature including. 
Asmussen and Kroese (2006) (AK) [7|, and Dupuis, Leder and Wang (2006) (DLW) ^ 
proposed efficient rare-event simulation estimators for geometric sums of regularly varying 
random variables. Blanchet and Glynn (2008) (BG) [10], and Blanchet, Glynn, and Liu 
(2007) (BGL) [15] proposed estimators for the tail of the steady state G/G/1 waiting time. 
Table 1 compares the performance of these algorithms. We use BL to denote the algorithm 
proposed in the current paper, with one cut-off point cq = 0.9(6 — s). 

Weibull-type distribution For the Weibull-type case, we consider the increment to have 
the following distribution, 

P(X >x) = e-'^, 

for t > -1 and EXi = -\. Table [2] compares the algorithm in this paper (BL) and that 
of Blanchet and Glynn (2008) (BG). For the implementation, we choose that cq = \Jh — s, 
ci = 0.1(6 — s), C2 = 0.5(6 — s), C3 = 0.9(6 — s), C4 = 6 — s — \/b — s. 
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[Estimation] 
[Std. Error] 


b= 10^ 


b = 10^ 


b = 10^ 


BL 


1.047e-03 
3.76e - 05 


3.175e-05 
2.602e - 07 


9.877e - 07 
8.187e-09 


AK 


L199e-03 
1.479e-05 


3.145e - 05 
2.186e-07 


9.980e - 07 
6.945e - 09 


BG 


L079e - 03 
5.968e - 06 


3.146e-05 
9.725e - 08 


9.980e - 07 
2.073e - 09 


BGL 


1.022e-03 
3.835e - 05 


3.167e-05 
1.598e-06 


L128e-06 
7.280e - 08 


DLW 


1.046e - 03 
5.195e-06 


3.163e-05 
1.694e-07 


9.905e - 07 
2.993e - 09 



Table 1: Estimated tail probabilities of regularly varying random walks 



[Estimation] 
[Std. Error] 


b = 250 


b = 500 


6 = 650 


BL 


6.985e- 13 
5.639e - 14 


1.778e- 18 
1.936e - 19 


3.900e - 21 
5.696e - 22 


BG 


7.076e - 13 
1.20e- 14 


1.897e - 18 
5.083e - 20 


3.971e-21 
7.95e - 23 



Table 2: Estimated tail probabilities of the WeibuU-type distribution 

A Technical proofs in Sections [3] and [4 

Proof of Lemma [H Observe that B2 imphes log(A (x) /A (bo)) < log((x/6o)*). In other 
words, A{x) < A (bo) b^^^x^'^. Consequently, substituting into B2 we have that for x >bo 

A (x) < /3oA (x) /x < PoA (bo) 60 ^a;*"^ = O [x^'-^) . 



Proof of Lemma [21 First, since G (■) is decreasing then for a; < 6 — A ^(A(6) — a* 

G{b-x) ^ G(A-i(A(6) - a,)) 



G{b) 



G{b) 



By continuity of G {■) it suffices to show that the right hand side is bounded for all b suffi- 
ciently large. Using L'Hopital's rule we conclude that 

G{A-'{A{b) -a,)) exp(-A(6) +a,) d 

—A [A [xj - a*^ 



Gib) 

Now, note that for all x > bo 
d 



exp(— A(6)) dx 
X{x) 



< 



A(x) 



dx^ ' ''*^ X (A-i (A (x) - a,)) - A (A"i (A (x))) 
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The inequality follows from the fact that A (■) is non increasing and a* > 0. This allows to 
conclude the statement of the lemma. ■ 

Proof of Lemma [3l The second part assuming that F{-) is regularly varying follows from 
Karamata's theorem. Now, for non-regularly varying part, we simply note using L'Hopital's 
rule and Lemma [H 

F(x) 

lim = lim A(x) = 0. 

x^oo G[X) x^oo 

The lower bound follows immediately. Again, using L'Hopital's rule, the upper bound then 
follows from the fact that 

xFix) x\{x)F{x) — Fix) 
lim = lim =— = oo. 

X^QO G[x) x^oo F[X) 

The last step is thanks to Assumption Bl. 
■ 

Proof of Lemma m, This is a direct application of condition B2. Indeed, if a; > feo > and 

y>o 



px+y j-x+y / 

logA(x + y) - logA(x) = / d\og X{t)dt< / /3ot"^cit = /3o log I 



X + y 



X 



which is equivalent to the statement of the lemma. ■ 

Proof of Lemma [51 Equivalently, we must show that for x sufficiently large 

a* > A(x) - A (x - a;°) , 
where a = (1 — /3o)/2. Now, note using Lemma H] that 

A(x) - A A (. - - l) < A (. - ((^)" - 1 

For all X sufficiently large, using a Taylor expansion, the right hand side is bounded by 
A (x — x") (2/3ox""^). Consequently, once again applying Lemma S] we conclude that 

A(x) - A (x - x") < A (x - x") (2/3ox"-^) < 4/3oA (bo) x'^°-^+" 

The right hand side goes to zero as x oo given our selection of a and therefore is less than 
a* for all x sufficiently large as required. ■ 

Proof of Lemma [61 If Assumption A is satisfied then it is well known that both F and G 
are subexponential. Let us then assume that B2 holds, and then we obtain xA (x) < /SqA (x) 
for all X > bo and Po ^ (0, 1). Applying Pitman's criterion (Proposition [1]) and the fact that 
(by Lemma [1] in particular A (x) = O (1) for x > bo) it suffices to verify that 

exp (xA (x) — A (x)) dx < oo. 

bo 
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Nevertheless, combining Bl and B2 we have that there exists c G (0, oo) such that 

POO POO 

exp (xA (x) - A (x)) dx< e^^'^^^^'^^rfx < c / x'^dx < oo 

60 Jbo Jbo 

and we conclude the lemma. 

For the subexpontentiality of the integrated tail, it is sufficient to show that 

limsu ^F{x) ^ ^ 

x^oo -G(x) log G'(x) 

and apply the same analysis for the sub exponent iality of F. By L'Hopital's rule (possibly 
on a subsequence), 

xFix) , xA(x) — 1 , xA(x) — 1 
limsup ^ , , , — ---^ < limsup -:; — ---^ < limsup -■; —rr < Po 

r^^oo -G(x)logG(x) x^oo l + logG(x) x^oo loge + logx - A(x) 

The second inequality is due to Lemma [3l The last inequality is from the fact that log x = 
o(A(x)) and Assumptions Bl and B2. F{x)/G{x) and — logG(x) are the hazard function 
and cumulative hazard function of the integrated tail. The proof is completely analogous 
and therefore is omitted. ■ 

Proof of Lemma [9l Given Pq G (0, 1), one can always select ai as indicated in the 
statement of the lemma. Note that there exists a 5 > such that for all Ui < x < 1 — ai 

x^° + (1 - x)'^° >1 + S. 

So, by continuity and with ai small enough, we can find (72 > small enough so that 

x^° + (1 - X - ai/2f° > 1 + (T2. 

Therefore, we know that we can select 

aj = cij-i + cTi/2, 

as long as (Ti/2 < dj-i < 1 — 0-1/2. Now select k = [2(1 —(Ji)/a{\ and we have > 1 — cri/2. 



B Technical proofs in Section [6 

Proof of Lemma IllL First it is straightforward to verify (1511) out of definition (l5Ql) . By 
integrating both sides of fjST]) . it is also immediate to see 

ql{x)dx = 1. 

) 

Now, we just need to verify that if p{s) < 1 then (1 — p{s))q*{x) > 0. We concentrate on 
the case in which Assumption B prevails (if Assumption A is in force the arguments carry 
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over in very similar forms). When b — s > ?7*, using the definition of (x) given in Section 
12.11 we obtain 

r \ ^ < Co) 



P{X < Co) + P**f**'^^\-^'> + YlPjfA^l-^) 



P{X < Co) 

p^^f{x)I{x > Ck) _ P^f (x) I{X > Cfc) 

P{X>Ck) P{X<co) 
Pkfib - s- x)I{x e (cfc_i, Ck]) p*f (x) I{x e (cfc_i, Cfc]) 



PiX eib-s-Ck,b-s-Ck^i]) P(X<co) 

''^^ ' ' "'x e (cj_i,cj]) p*/(x)/(x e (cj_i,Cj]) 



/ Pjf{x)I{:^ 
2-^\ P(Y a 



-V P(Xg (c,_i,c,]) P(X<Co) 
Therefore, 

P^^f{x)I{x > Ck) P*f (x) /(x > Cfc) 



(1 -p(s))g* (x) 

+ 



P{X>Ck) P{X<co) 
Pkf{b - s- x)J(x e (cfc_i, Cfc]) pj (x) /(x e (cfc_i, Cfc]) 



PiX e{b-s-Ck,b-s-Ck^i]) P{X<co) 

,\^( Pifi^)Hx e (ci-i,Cj]) _ pj{x)l{x G (cj-i,Cj]) N 
+ Z^l P(XG(c,_i,c,]) P(X<Co) i- ^ ' 



To verify that (1 — p{s))q*{x) > 0, the most interesting part involves the second line in the 
above display corresponding to the interval x G (cfc_i,Cfc]. The reasoning for the rest of the 
pieces is similar and therefore is omitted. On the interval {ck-i, Ck] we have that b — s — x < x 
assuming that b — s > r]^ and 77^, is sufficiently large. Since / (■) is eventually decreasing (a 
consequence of Assumption B3), then 

f{b-s-x)>f{x), 

when X G (cfc_i,Cfc]. Consequently 

Pkfjb -s- x)/(x G (cfc-i, Cfc]) _ pj{x)l{x G (cfc-i,Cfc]) 
Pi^X e{b-s-Ck,b-s-Ck^i]) P(X<Co) 

Pkf (x) /(x G (cfc_i, Cfc]) p*/ (x) /(x G (cfc_i, Cfc]) 



> 



P(Xg (6-s-Cfc,6-s-Cfc_i]) P(X<co) 



Further, we have that pk = e^p** decreases to zero at most linearly in (6 — s)~^, whereas 
P{X G {b — s — Ck,b — s — Cfc„i]) goes to zero faster than any linear function of {b — s)~^. 
Therefore, (1 — p (s)) q* (x) / (x G (cfc_i, Cfc]) > 0. The remaining pieces in fl5^ are handled 
similarly. ■ 
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Proof of Lemma 1121 Note that 

pkb-] 

Q* {N, > kb) = E^* [HpiSj) I , (55) 

\j=o 

where p{s) is defined in fH9l) . In addition, for some e > 0, 

I m \ ( [fc^i 

\i=o / \i=o 

+ Q* ^sup \ Sj — — e max{j, 6} > 0^ . (56) 



Notice that for any e > 0, 



sup[|S'j — — emax{j, 6}] > I =0. 
i=i / 

Then, for some K sufficiently large (using an argument similar to that given in the proof of 
Proposition [9l) we conclude 



\kh-\ 

E"^' I IIp^^^^^^\^^ - ^ eme.x{j,b}) 1 < Kk' 



■eo 



J=0 



for some Eq small enough. This is because 1 — p{s) = (1 + o(l))p** as 6 — s — >■ oo and e — )■ 0. 
Thereby, we conclude the proof applying the previous two estimates into ( l55l) and ( l56i) . ■ 
Proof of Lemma I13L Let 

ql{x)dx = R (s) 



b-s 

Note that for 6 — s > 77* we have that 

i?(s) = 0(e) + 6"'^**. (57) 

Let 

= inf{n > 1 : 5„ > 6}. 

Now observe that 

Q* in = Nk) = ^g*(iV, = k,Sk>b,n>k-l) 

k=l 

00 

> ^g*(iv, = k,Sk> b,Ti^^^ >k-i). 

k=l 
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Because of (1571) we obtain that 

oo 

J2Q*iN, = k,Sk>b,Ti_^^>k-l] 



k=l 



> (0(e) + e--*) J2 Q*iNb = k, t[_^^ >k-\) 



k=l 

= (0(e) + e-'^")Q*(r^^, >Nt,-l,m< oo) 
>(0(5) + e---+o(l))g*(r^^, = 00). 

The term o(l) — )■ as 6 — ?■ 00 comes from Lemma [T2] which shows that Q*{Ni, = 00) = o (1) 
as 6 — )■ 00. Finally, we observe 

Q*in-v, = 00) = 1 - m(6 - 7]^) 1, 

as 6 — 00. The conclusion of this lemma follows. ■ 
Proof of Proposition [91. For 6b = l/log6, define 

Bh = {S : \Sj - jfi\ < max(5^\ 6bj), I <j < ta{b)}. 
It is clear that limb^oo = 1. 

If F is regularly varying, note that 1 — p{s) = (1 + o(l))p** as & — s — t- 00, e — )■ 0. For 
all S eBb 

lHb)\ ^ ( ltaib)\ p/, . I u 

g* [Nb > ta{b)\s) = n PiS,) = (1 + 0(1)) exp - 5^ ^^ Giblila ) 
By Karamata's theorem we have that 

If Assumptions B1-B4 hold, We clearly have that 

, , G(x) 
F{x) 



POO 

/ P{X > X + t\X > x)dt 
Jo 

P{X >x + t/X{x)\X > x)dt. 



K^) Jo 

Now we can invoke Assumption B4 together with the dominated convergence theorem to 
conclude that 

00 POO 

P{X > x + t/X{x)\X > x)dt — > / exp{-t)dt=l 
Jo 

47 



as X — )■ oo. In addition, by the fundamental theorem of calcuhis we have that 



A (x + y/\ (x)) - A (x) 



y 



A(x) 



A (x + yu/\ (x)) du 



and, in view of this representation. Assumption B4 is equivalent to stating that for each 
K e (0,oo) 



lim sup 



y/\{x) 



X {x + z) dz — y 



lim sup 



ya{x) 



a{x + z) 



dz-y 



0. (59) 



Observe that, since A (■) is eventually non-increasing, 

L*/^('')J rt/X{b) 



3=0 

We then conclude that 



rt/A(0) L*/A(b)J 

J2 A(6+(j + l)|/i|)< / A(6 + x|/i|)rfx< ^ A(6 + j|/i|). 



[t/X{b)\ 



0< / \{b + x\fi\)dx- V A(6+(j + l)|/i|) < A(6) 
as 6 oo. Therefore, applying (158|1 and (159|) we conclude that 



ltaib)\ 

lim 7 



as 6 OO and consequently we have that for all S E Bi, 

[ta(b)i 



Q* (^Nk>ta{b)\S) = (l + o(l))exp 
We then conclude that 



j=0 



G{b + j\fi\) 



lim Q* (Nb> ta{b)\S) = e 

b-^oo 
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